• 2 Posts
  • 409 Comments
Joined 5 months ago
cake
Cake day: September 9th, 2025

help-circle


  • Appreciate all the info! I did find this calculator the other day, and it’s pretty clear the RTX 4060 in my server isn’t going to do much though its NVMe may help.

    https://apxml.com/tools/vram-calculator

    I’m also not sure under 10 tokens per second will be usable, though I’ve never really tried it.

    I’d be hesitant to buy something just for AI that doesn’t also have RTX cores because I do a lot of Blender rendering. RDNA 5 is supposed to have more competitive RTX cores along with NPU cores, so I guess my ideal would be a SoC with a ton of RAM. Maybe when RDNA 5 releases, the RAM situation will have have blown over and we will have much better options for AMD SoCs with strong compute capabilities that aren’t just a 1-trick pony for rasterization or AI.


  • I’ve been looking into self-hosting LLMs, and it seems a $10k GPU is kind of a requirement to run a decently-sized model and get reasonable tokens / s rate. There’s CPU and SSD offloading, but I’d imagine it would be frustratingly slow to use. I even find cloud-based AI like GH Copilot to be rather annoyingly slow. Even so, GH Copilot is like $20 a month per user, and I’d be curious what the actual costs are per user considering the hardware and electricity cost.

    What we have now is clearly an experimental first generation of the tech, but the industry is building out data centers as though it’s always going to require massive GPUs / NPUs with wicked quantities of VRAM to run these things. If it really will require huge data centers full of expensive hardware where each user prompt requires minutes of compute time on a $10k GPU, then it can’t possibly be profitable to charge a nominal monthly fee to use this tech, but maybe there are optimizations I’m unaware of.

    Even so, if the tech does evolve and it become a lot cheaper to host these things, then will all these new data centers still be needed? On the other hand, if the hardware requirements don’t decrease by an order of magnitude, then will it be cost effective to offer LLMs as a service, in which case, I don’t imagine the new data centers will be needed either.
















  • Richard Freymann explaining light

    Sounds like it’s not actually Feynman, it AI.

    This isn’t his voice — it’s our tribute to his teaching style, created purely for education and inspiration. No impersonation intended, just deep respect for one of history’s greatest teachers. 🙏

    All content is created to inspire, educate, and encourage reflection. This channel follows YouTube’s monetization policies, including clear labeling of synthetic media.