hi,

I got the whisper_stt extension running in Oobabooga and it (kinda) works. However it seems really, really bad in understanding my speech and recognition has been spotty at best.

I saw some youtube tutorials where it seemed to have no problem in understanding - even when spoken to in quite a bit of an accent - and in my own experience it performs knowhere near as good as shown there.

So - are there things I can do to improve its performance? Or may the yt tutorials have been edited to give a wrong impression and spotty performance is what to expect?

I’m very happy with the silero_tts and if I can get the speech-to-text to work at the same level, I’d be a happy camper already

Edit: It seems to be a memory problem. I can select several models in the extension interface - tiny, small, base, medium, … If I choose the tiny or small model, it does work but with the poor results I mentioned above. If I select the medium model I get an OOM error (something like: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 11.99 GiB total capacity; 11.14 GiB already allocated; 0 bytes free; 11.22 GiB reserved in total by PyTorch) It looks to me as if the language model reserves the whole of my VRAM (which is 12GB) and doesn’t leave any for the extension - is there a way to tweak it?

Edit 2:

Ok so, if I use a smaller language model (like a 6B model) it seems to be working perfectly fine with the medium whisper model … so it is probably a memory issue. I have already tried to start with the command flag “–gpu-memory 5/8/10” which doesn’t seem to do anything. Are there other ways of memory management?

  • Blaed@lemmy.worldM
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    11 months ago

    You could try reducing your memory overhead by going down to 3B parameters. If you want to avoid that - maybe experiment with different models between both GPTQ & GGML formats?

    If you’re willing to spend a few dollars an hour, you could drastically increase overall memory and power and see if you can get it running on a rented GPU through something like vast.ai or runpod.ai. Might be worth exploring for any test of yours that might need extra oomph.

    Given time, I think many of these models will become easier to run as new optimization and runtime methods begin to emerge.