kiwicopple 2 years ago

Hey HN, yesterday we shared a post[0] about how you can store OpenAI embeddings in Postgres with pgvector.

This is a follow up demonstration, adding a "ChatGPT" interface to our own docs. To use it, you can go to https://supabase.com/docs and then type "cmd + /". This will pull up a "Clippy" interface, where you can ask it questions about supabase (sorry in advance to mobile users)

In "Show HN" spirit, it's very hacky/MVP, so I expect it will break. We'd value any feedback.

[0] Storing OpenAI embeddings in Postgres: https://news.ycombinator.com/item?id=34684593

  • kiwicopple 2 years ago

    If you want to build something like this yourself, we sponsored Greg to create a video detailing all the steps: https://www.youtube.com/watch?v=Yhtjd7yGGGA

    Also special shoutout to pgvector (https://github.com/pgvector/pgvector), which is used to store all of the embeddings.

    • btown 2 years ago

      Note that pgvector isn't supported on any of the large cloud providers' hosted Postgres offerings, other than Supabase. https://github.com/pgvector/pgvector#hosted-postgres has instructions on how to add your voice to request it to be added!

      (It does seem that the ancient https://github.com/eulerto/pg_similarity is supported by RDS and Google Cloud - but it's hard to tell whether attention was paid to its performance characteristics with nearly the rigor that pgvector seems to have been designed.)

    • Yenrabbit 2 years ago

      This is so great! Thank you, and Greg if you read this well done on an excellent video and project

      • gregnr 2 years ago

        Thanks! I've a had a blast working with Supabase on this, glad to hear that you find it interesting!

  • moooo99 2 years ago

    Feedback: Please add a button. This seems super useful, but unfortunately I can't test it. I'm on a non US Layout and can't use the shortcut "cmd + /". On my keyboard a "/" is opt + 7 and using "opt + cmd + 7" unsurprisingly does nothing.

    • saltcod 2 years ago

      Thanks for this. Added a button to the navbar. Sorry about the delay there.

    • kiwicopple 2 years ago

      thanks for the feedback - we'll add a button as soon as OpenAI approves our increased spend-cap request

alokjnv10 2 years ago

That's so interesting. I'm sure other companies will integrate GPT on their developer documentations. We have built similar tool called Corpora https://askcorpora.com It allows users to upload their pdf files and search through them in natural language and perform Q&A. It uses same technology that ClippyGPT used. Do you have any feedback for us?

imjonse 2 years ago

I remember looking for Android integrations a few days ago but getting only 3rd party repos, so I thought maybe this interface can point to the most adequate one. With typical ChatGPT confidence it pointed me to the Supabase Android SDK at https://supabase.com/docs/android which is 404 :)

FinalBriefing 2 years ago

How far are we from Clippy being able to scaffold out migrations or table schema?

"Clippy, create a table called "users" with the fields: first name, last name, bio."

swyx 2 years ago

awesome velocity - i could have sworn this was another of your famous Launch Weeks but its just a regular week for you now lol

there's been a lot of these bots coming out, and some of the poorly implemented ones are probably going to make this endeavor look bad, and i'm wondering for your thoughts - what Quality Control/testing approach do you think makes sense for an unbounded chat bot like this?

  • kiwicopple 2 years ago

    in our MVP checklist we had "XSS testing" and "Prompt injection", the latter somewhat cursory because there's no feasible way to prevent it right now. We found a lot of ways to "break out" of the prompt (which is also very visible since we're open source). Luckily, prompt break-outs are relatively benign (as long as we have spend-caps on OpenAI).

    The biggest win for companies like us is that something like this becomes ubiquitous, so that people get bored of prompt-hacking and then just use the tool like they are supposed to. Over time we'll add rate-limiting, caching, and other hardening.

    • redeux 2 years ago

      I’m actually working on an open source project to mitigate against prompt injections, handle cache, etc. and I’m collecting instances of prompt misuse. I’d love to see your findings. Did you document them anywhere that I can review?

      • kiwicopple 2 years ago

        we're not saving the prompts in this version. It's really an MVP, we couldn't have stripped it back much more

        We'd be happy to share some prompts in the future (as long as we can determine that it respect out users' privacy/safety)

ushakov 2 years ago

It's a great idea, but I'm really concerned about the use of OpenAI in this

Have you tried implementing the same functionality using open-source models?

  • kiwicopple 2 years ago

    we haven't but we certainly will in future iterations

    • Raed667 2 years ago

      I'm curious what is the estimated cost for running this?

      • kiwicopple 2 years ago

        it looks like we're getting 1800 requests per hour - about $5/hour, which is pretty cheap considering the traffic seeing this blog post. That said, only desktop users who have access to the keyboard and have read the blog post can use it. If we made it visible to everyone I think we would quickly run into API limits.

        • wonderfuly 2 years ago

          $5 for 1800 requests, that is 0.0027 per request, text-davinci-003 price is $0.02/1k token, so the average token number is 135. That is far below my expectation.

asontha 2 years ago

Awesome stuff! If anyone is interested in a hosted and managed version of this, let us know! We can do this on top of any set of information already (not just .mdx) and have it exposed either as an API or through production ready Slack and Discord bots :)

Email me: arvind@kyberinsurance.com

nemo44x 2 years ago

Very cool! I'm looking forward to "smart search bars" (coined by me) being common place where you can search for things and it returns both knowledge-base/web pages that are relevant and its own color based on results, context, and other sources.

  • kiwicopple 2 years ago

    the neat thing about the vector implementation is that we can immediately return the relevant pages while GPT formulates an answer, which gives this a hybrid "search + ask" function. We'll aim to do that in the next iteration

    • gl-prod 2 years ago

      Hey, if you can, please add a Stop Generating button. Sometimes I see what I'm looking for and just want it to stop.

  • say_it_as_it_is 2 years ago

    how did you ever come up with such a clever name!

williamcotton 2 years ago

Everyone realizes this works for the same reason that Copilot works, right? And that this was trained on the same GPL code?

Is this not an excellent example of an argument that LLMs such as GPT should be able to train on copyright protected works as it constitutes fair use?

  • serverlessmania 2 years ago

    I think for code it will be pretty hard, unless the writer gives explicit permission to train the model on the copyrighted data, like the case for Copilot in GitHub.

    And I don't think it is fair use at all, imagine a company like OpenAI train the model on its own internal docs and code, then you'll be able to ask the model to replicate ChatGPT and copilot, or even closed software like Photoshop.

    • williamcotton 2 years ago

      I'm sorry, what don't you think is fair use, Supabase Clippy?

      • serverlessmania 2 years ago

        I'm not talking about Supabase Clippy, but more about training models on copyrighted data without asking for permission (like private copyrighted code in GitHub for example and yes, I don't call that fair use)

        • williamcotton 2 years ago

          Supebase Clippy uses the same trained model as Copilot, OpenAI’s GPT family of large-language models, including having trained on all of the code in GitHub, without having asked permission and without regard to copyright license.

          These tools are being released under the assumption that training large-language models will be found as fair use of any copyrighted works.

          Are y’all starting to see the arguments for why the model and the outputs of the model are two different issues and that the models themselves, and the products built on top of them, will be considered fair use and that the liability for copyright infringement lays completely with the person using the tool?

gherkinnn 2 years ago

Supabase keeps on giving.

I don’t use it myself, but whatever they’re doing looks impressive from the outside.

counttheforks 2 years ago

> Want to try it out? It's a hidden feature while in MVP - visit supabase.com/docs and hit cmd + / to pull up the search box (sorry mobile users).

My keyboard does not have a cmd key, and I am not a mobile user.

  • CSDude 2 years ago

    My keyboard does noy have dedicated / . Both modifiers ctrl/cmd + shift + 7(/) does not work either.

    • saltcod 2 years ago

      Added a button to the navbar!

  • saltcod 2 years ago

    Hey — one of the Clippy engineers here. Ctrl work for you?

  • kiwicopple 2 years ago

    5 mins - we can push another key combo

    • moralestapia 2 years ago

      Yes, please, it's not working on my setup either.

      I have to do Shift + 7 to get the "/", so maybe that breaks it; also Ctrl/Meta + "/" is already mapped to an OS X function.

      What about Ctrl/Meta + Space? I've seen that around.

      Also, lol, you release a great thing and the discussion turned out to be around the hotkeys, ha.

    • e12e 2 years ago

      And on mobile?

      • kiwicopple 2 years ago

        No mobile for now - that's our artificial way of 50/50 rate limiting this before all our credits run out

  • atgctg 2 years ago

    `Ctrl + /` should also work

    • moooo99 2 years ago

      Unfortunately this is not a possible key combination in many non-US keyboard layouts.

      • saltcod 2 years ago

        Hey — sorry about the delay on this. Just pushed up a button in the navbar.

burmecia 2 years ago

AFAIK, ChatGPT's training data is until 2021, can it answer questions based on the latest Supabase documents?

k__ 2 years ago

lol, I hoped there was a Google Docs alternative by Supabase (that has a Clippy)

yreg 2 years ago

Please explain to me what's Supabase, but make it rhyme.

    Supabase is a great place,
    For hosting your database.
    It's open source and free,
    So you can use it with glee.
    It's simple to get started,
    No infrastructure to be charted.
    It's a hosted platform,
    So you can use it with no qualm.
  • kiwicopple 2 years ago

    fun anecdote: we have a bunch of integration guides (supabase + some other product), and in first few iterations we found that Clippy was recommending those products instead of Supabase. eg: "how can I get started with auth" would recommend trying out supertokens (another great open source auth solution)

  • nemosaltat 2 years ago

    For anyone struggling with that final couplet, try an English accent.

    • mcbuilder 2 years ago

      Great, not only is ChatGPT better at rhyming than me, it also sounds more sophisticated while doing it. LLMs fail to pass for sophomoric when viewed by an expert, but to the mediocre it exposes one's shortcomings. Maybe because they sort of are big giant averaging machines.

  • gl-prod 2 years ago

    Interesting I got the same result. Could it be caching prompts?

    • JamesSwift 2 years ago

      They currently have it configured to be fully deterministic when generating. If you tweak the prompt at all it should break out of that I assume.

      • kiwicopple 2 years ago

        The temperature is currently set to zero, so that makes sense