• chiisana@lemmy.chiisana.net
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 days ago

    There’s really compelling open source models like Zonos coming out; ElevenLabs will need to figure out how to thread the needle to keep everyone happy while other solutions eat into the pie.

    • Blass Rose@pawb.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 days ago

      Oh I’m glad this tech went somewhere useful! I remember reading the paper and toying with the models they released as a proof of concept like… 8 years ago? It was really powerful back then. The ability to do TTS of someone’s voice given literally 3 seconds of training data?! (In fact I found that it worked best with short, nonsense audio clips than actually saying anything. Saying “test test test” worked way better than reading an actual sentence.) But now it looks like it can actually handle tone well. It’s also probably way better now, and less… Asthmatic sounding.