There has been an overwhelming amount of new models hitting HuggingFace. I wanted to kick off a thread and see what open-source LLM has been your new daily driver?
Personally, I am using many Mistral/Mixtral models and a few random OpenHermes fine-tunes for flavor. I was also pleasantly surprised by some of the DeepSeek models. Those were fun to test.
I believe 2024 is the year open-source LLMs will catchup with GPT-3.5 and GPT-4. We’re already most of the way there. Curious to hear what new contenders are on the block and how others feel about their performance/precision compared to other state-of-the-art (closed) source models.
This one is only 7B parameters, but it punches far above its weight for such a little model:
https://huggingface.co/berkeley-nest/Starling-LM-7B-alphaMy personal setup is capable of running larger models, but for everyday use like summarization and brainstorming, I find myself coming back to Starling the most. Since it’s so small, it runs inference blazing fast on my hardware. I don’t rely on it for writing code. Deepseek-Coder-33B is my pick for that.
Others have said Starling’s overall performance rivals LLaMA 70B. YMMV.
What sort of tokens per second are you seeing with your hardware? Mind sharing some notes on what you’re running there? Super curious!
I would also be interested in Code-Pilot Models that are reaching for same performance like GitHub or Microsofts paid Models.
Currently I use TabbyML but the available Models are by far inferior.
Of all of the code specific LLMs I’m familiar with Deepseek-Coder-33B is my favorite. There are multiple pre-quantized versions available here:
https://huggingface.co/TheBloke/deepseek-coder-33B-base-GGUF/tree/mainIn my experience a minimum of 5-bit quantization performs best.
I was pleasantly surprised by many models of the Deepseek family. Verbose, but in a good way? At least that was my experience. Love to see it mentioned here.
Personally I find myself renting GPU and running Goliath 120b. Smaller models could do what I’m doing if I spent more time optimizing my prompts. But every day I’m doing different tasks, and Goliath 120b will just handle whatever I throw at it, no matter how sloppy I am. I’ve also been playing with LLAVA and Hermes vision models to describe images to me. However, when I really need alt-text for an image I can’t see, I still find myself resorting to GPT4; the open source options just aren’t as accurate or detailed.