worthless443 2 years ago

Automatic memory management is indeed the first thing one needs to look for writing performance critical software, and that's a first in my check-list. But

> in-memory storage of databases

Doesn't that sound a bit expensive to have large capacity memory? Although the expense of R/W IO is far cheaper for in-memory analysis. Is such trade-off worth it?

  • mbuda 2 years ago

    Excellent observation/question :D It depends; sometimes, it's worth it, and sometimes it is not (as always with tradeoffs). Graphs are a bit specific because most of the traversals or expensive graph analytics like PageRank touch the whole graph (even multiple times) -> the entire graph will end up in memory -> why not keep it in memory for faster performance?

    But for a vast dataset, the hardware cost might be too much. I think we are aware of the tradeoff. We'll probably provide disk first storage option at some point because that's definitely a valid setup (sometimes the only possible setup). Ofc, we'll invest time in making it as performant as possible.

    Do you have some specific workload in mind? :D

    • worthless443 2 years ago

      If a large graph is needed to be read multiple times, sure memory bandwidth will result in the most performance possible under the context of this workload like interacting with PageRank (and going further with optimization techniques on memory allocation and management, will boost the performance even further).

      So to my understanding (and a novice one at that), the graph should be stored on disk first, upon initializing the objects will have to be an one-time copy to volatile memory but I question, memory regions are more likely to yield faults and get corrupt and thus graph stored in-memory is also completely flushed? (unless the results are being saved to disk in-between specific intervals of time?) Does that make any sense?

      • mbuda 2 years ago

        I'm not sure I understand the part about corruption. How would data in memory become corrupted?

        How Memgraph currently works, it stores data in memory, and async starts writing data to disk in small data chunks called deltas, later these chunks are deleted and replaced with the whole graph snapshot (there is also a sync option, but that's slower in terms of committing a transaction, letting the user know data is written, e.g., RocksDB works similarly). All disk-related stuff is purely for durability (recovery after the Memgraph process restarts and all interactions with the disk are made automatically in the background during standard system runtime and startup time).

        • worthless443 2 years ago

          > it stores data in memory, and async starts writing data to disk in small data chunks called deltas, later these chunks are deleted and replaced with the whole graph snapshot

          Thanks, that fairly answers my question of recoverability of in-memory graphs.

          • mbuda 2 years ago

            Perfect!

mbuda 2 years ago

Is there any interest in detailed comparison between C++ and Rust when it comes to different tradeoffs when implementing/using the query modules?

  • ncmncm 2 years ago

    Differences would be about staffing. For any given specialty, having C++ skills too is common.

    Finding somebody with needed skills and also Rust experience will be impossible, so you would either need to plan on training up some Ruster on the specialty, or hire somebody already up on it with C++ skills and expect them to pick up enough Rust to get by.

    • mbuda 2 years ago

      Yep, from the business perspective that's by far the biggest concern :D

timmy777 2 years ago

Awesome. But how is this different from dgraph?

  • mbuda 2 years ago

    If you are asking about Memgraph in general, overall it's a graph storage + analytics system. DGraph is probably more on the pure storage side, while Memgraph is more about graph analytics (in-memory graph storage but it also stores data on disk). In terms of the API, DGraph exposes GraphQL, while Memgraph is Cypher + Bolt protocol. There is much more, which aspect are you most intrested in? :D