• AliasAKA@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    9 months ago

    It’s also the required energy to train the model. Inference is usually more efficient (sometimes not but almost always significantly more so), because you have no error back propagation or other training specific calculations.

    Models probably take 1000 megawatts of energy to train (GPT3 took 284MW by OpenAI’s calculation). That’s not including the web scraping and data cleaning and other associated costs (such as cooling the server farms which is non trivial).

    A coal plant takes roughly 364kg - 500kg of coal to generate 1 MWh. So for GPT3 you’d be looking at 103,376 kg (~230 thousand pounds, or 115 US tons) at minimum to train it. Nobody has used it and we’re not looking at the other associated energy costs at this point. For comparison, a typical home may use 6MWh per year. So just training GPT3 could’ve powered 47 homes for an entire year.

    Edit: also, it’s not nearly as bad as crypto mining. And as another person says it’s totally moot if we have clean sources of energy to fill the need and the grid can handle it. Unfortunately we have neither right now.

    • sushibowl@feddit.nl
      link
      fedilink
      arrow-up
      2
      ·
      9 months ago

      If you amortize training costs over all inference uses, I don’t think 1000MW is too crazy. For a model like GPT3 there’s likely millions of inference calls to split that cost between.

      • AliasAKA@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 months ago

        Sure, and I think that these may even be useful and it warrants the cost. But it is to just say that this still isn’t simply running a couple light bulbs or something. This is a major draw on the grid (but likely still pales in comparison to crypto farms).

        Note that most people would be better off using a model that’s trained for a specific task. For example, training image recognition uses vastly less energy because the models are vastly smaller, but they’re exceedingly excellent at image recognition.

        • Zaktor
          link
          fedilink
          English
          arrow-up
          1
          ·
          9 months ago

          The article claims 200M ChatGPT requests per day. Assuming they make a new version yearly, that’s 73B requests per training. Spreading 1000MW across 73B requests yields a per-request amortized cost of 0.01 watt. It’s nothing.

          47 more households-worth of electricity just isn’t a major draw on anything. We add ~500,000 households a year from natural growth.