• areyouevenreal@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 months ago

    It’s a bit more complicated than that.

    New models are sometimes targeting architecture improvements instead of pure size increases. Any truly new model still needs training time, it’s just that the training time isn’t going up as much as it used to. This means that open weights and open source models can start to catch up to large proprietary models like ChatGPT.

    From my understanding GPT 4 is still a huge model and the best performing. The other models are starting to get close though, and can already exceed GPT 3.5 Turbo which was the previous standard to beat and is still what a lot of free chatbots are using. Some of these models are still absolutely huge though, even if not quite as big as GPT 4. For example Goliath is 120 billion parameters. Still pretty chonky and intensive to run even if it’s not quite GPT 4 sized. Not that anyone actually knows how big GPT 4 is. Word on the street is it’s a MoE model like Mixtral which run faster than a normal model for their size, but again no one outside Open AI actually can say with certainty.

    You generally find that Open AI models are larger and slower. Wheras the other models focus more on giving the best performance at a given size as training and using huge models is much more demanding. So far the larger Open AI models have done better, but this could change as open source models see a faster improvement in the techniques they use. You could say open weights models rely on cunning architectures and fine tuning versus Open AI uses brute strength.