Ask HN: When/where can we get an offline GPT3 type chatbot?
With the new OpenAI's ChatGPT, it's super neat and all. But there were lots of issues I had with it, sometimes I felt like I was running into arbitrary guard rails because they are concerned about bad PR or something.
Also the fact that you have no control on weights or other modifications to help steer it into certain areas.
After having been majorly spoiled by StableDiffusion and being offline and all the community mods/changes that have been contributed to it. I now what an offline chatbot model.
I think I read there are some older GPT2 that is available offline but also that most of them are still considered 'inefficient'. What does this mean? Is it the compute to use it, or the physical size of the model? Would it be at all possible to split it into groups or stuff (e.g. I only care about English and programming languages and not about other cultural languages).
I am sorry if this is common knowledge to those in the know, but could someone help share some details if what I am asking is silly(like asking for an offline version of a search engine) or I am asking the wrong questions?
I think the best publicly available model that can follow instructions right now is https://huggingface.co/bigscience/bloomz.
It has 176 billion trainable parameters, but I think it uses up terabytes of memory, so there is a trade off between model size and the ability of the model.
On most GPUs, you should be able to run https://huggingface.co/google/flan-t5-large. It is pretty good and is trained to follow instructions.
InstructGPT is actually 1.3B, you don't need 175B like initial GPT-3 model: https://openai.com/blog/instruction-following/
ChatGPT is likely a much better variant, but also must be still small and will likely be portable.
ChatGPT is based on GPT3.5 which is a series of 175B models that includes code-davinci-002, text-davinci-002, and text-davinci-003. I do not think of it as portable.
Network distillation to the rescue! Provided that you have the means to distill in the first place :D
Would it be possible to run this entire model sequentially, dumping intermediate results to SSD?
I've just tried out a few of the flan-t5s and they're surprisingly coherent. It likes dogs and pine trees, thinks Makerbot is the best 3D printer lol. Can even generate some code, though it's usually wrong. And it can't seem to decide if a flower pot is orange or red.
Any idea if it's possible to chain feed a conversation into this one? I've tried a few various Q and A formats but none seem to really grab the old context.
Here is an example of one general purpose open source LLM, probably the best you can get:
https://github.com/EleutherAI/gpt-neox
To manage your expectations it is nowhere as good as ChatGPT.
If you are interested in programming only:
https://github.com/salesforce/CodeGen
is decent.
Can codegen handle other programming languages than python?
Here's a full list of the big ones:
https://lifearchitect.ai/models/
https://lifearchitect.ai/timeline/
There are another 97,000 Transformer models on HF:
https://huggingface.co/models