wenc 10 months ago

This is good, but could also be good to mention that you're using umap for dimensionality reduction with cosine metric.

https://github.com/Z-Gort/Reservoirs-Lab/blob/main/src/elect...

Dimensionality reduction from n >> 2 dimensions to 2 dimensions can be very fickle, so the hyperparameters matter. Your visualization can change significantly significantly depending on choice of metric.

https://umap-learn.readthedocs.io/en/latest/parameters.html

You may want to consider projecting to more than 2 dimensions too. You may ask, how does one visualize more than two dimensions? Through a scatterplot matrix of 2 axes at a time.

https://seaborn.pydata.org/examples/scatterplot_matrix.html

These are used for PCA-type multivariate analyses to visualize latent variables in higher dimensions than 2, but 2 dimensions at a time. Some clustering behavior that cannot be seen in 2 axes might be seen in higher dimensions. We used to do this our lab to find anomalies in high dimensions.

  • isoprophlex 10 months ago

    About fickleness... indeed i've found this a kinda problematic thing when running large-d text embeddings through umap -- it always comes out spherical, blob-shaped, without any obvious segregation in the low-d projected space.

    IMO it's very difficult to make a "fire and forget" embedding interpreter. Maybe I never found the right parameters to umap but the results of running it (or any dimension reduction algo) always left me a bit underwhelmed.

    • antman 10 months ago

      Have you tried PaCMAP? It should be better and faster

      • wenc 10 months ago

        Thanks for the pointer to PacMap.

        I just tried it. My verdict?

        PacMap >= UMAP >> t-SNE.

        UMAP captures the basic pattern but PacMap makes it crisper.

gregncheese 10 months ago

I have yet to find a better tool than the old Tensorflow projector: https://projector.tensorflow.org/

Granted, it requires to prepare your data into TSV files first.

  • wenc 10 months ago

    That is indeed an excellent tool. Allows one to dynamically adjust and recompute umap and t-sne.

z-gort 10 months ago

lmk if anyone has any thoughts...if I could go back I may have not gone through Electron

Doing dimensionality reduction locally posed a few challenges in terms of application size--the idea was that by analyzing just a few thousand randomly sampled points you can get an idea of your data through a local GUI where you interact with your data and see some correlated metadata.

Not sure if there's too much need for an individual GUI to go along with Postgres as a VectorDB, maybe people just do analysis separate from a normal "GUI"? But maybe not.

What you think?

  • maxchehab 10 months ago

    Just some fast feedback, I can't copy & paste in the connection url input form. On a mac.

    Once loaded, I get the error "Table must contain a UUID column for vector visualization."

    I'm assuming it's trying to find an ID column for grouping? Can we manually specify this? My ID columns are varchars.

    • garybake 10 months ago

      Same here. I'm using langchain which creates a varchar id column. It also has different collections on the same table.

redwood 10 months ago

Have folks seen https://atlas.nomic.ai/ <-- absolutely beautiful vector visualization

  • dcreater 10 months ago

    Proprietary hosted solution to gain as I uncover insights in my data? Hard pass

  • Alifatisk 10 months ago

    Seem to require sign ups just to view it.

paddy_m 10 months ago

README suggestions:

Put the animated gif at the top

Add subtitles to the gif explaining what you're doing.

  • dcreater 10 months ago

    If I had a nickel for GUI/viz tools that bury the image/video or straight up don't have it in the readme.. lends credence to the popular opinion that engineers don't know how to communicate

abadid 10 months ago

Why use PostgreSQL instead of columnar databases that are likely to perform way better for these types of analytical workloads?

ddtaylor 10 months ago

Does this use pgVector?

  • z-gort 10 months ago

    It lets you visualize any column with type "EMBEDDING", and I think the only way to get that is through pgvector/pgvectorscale.

samanthasu 10 months ago

That is excellent visualization!

dmezzetti 10 months ago

Very interesting, thanks for sharing!

thangngoc89 10 months ago

As a non-native English speaker and not very familiar with vector database, the title seems very ambiguous to me. I understand it as Postgres as a GUI for some VectorDB. Upon closer inspection, I realized that "Postgres as a VectorDB" is a full name. Maybe shorten that thing to something else. Just my 2 cents.

  • colechristensen 10 months ago

    It’s just plain bad grammar, the title should be

    “Show HN: Reservoirs Lab, a Postgres VectorDB GUI”

    • monsieurbanana 10 months ago

      I think the confusing term is "VectorDB" which sounds like a name of an existing product. "A vector db GUI powered by Postgres"?