• BetaDoggo_@lemmy.world
    link
    fedilink
    English
    arrow-up
    28
    arrow-down
    6
    ·
    edit-2
    4 months ago

    How many times is this same article going to be written? Model collapse from synthetic data is not a concern at any scale when human data is in the mix. We have entire series of models now trained with mostly synthetic data: https://huggingface.co/docs/transformers/main/model_doc/phi3. When using entirely unassisted outputs error accumulates with each generation but this isn’t a concern in any real scenarios.

    • Something Burger 🍔@jlai.lu
      link
      fedilink
      English
      arrow-up
      33
      ·
      4 months ago

      As the number of articles about this exact subject increases, so does the likelihood of AI only being able to write about this very subject.