• 1 Post
  • 162 Comments
Joined 6 months ago
cake
Cake day: August 27th, 2025

help-circle










  • Yeah me too. Opus 4.5 is awesome but my god…om nom nom go my daily / weekly quotas. Probably I should not yeet the entire repo at it lol.

    4.6 is meant to be 2x worse for not much better output.

    Viewed against that, Codex 5.3 @ medium is actual daylight robbery of OAI.

    I was just looking at benchmarks and even smaller 8-10B models are now around 65-70% Sonnet level (Qwen 3-8, Nemotron 9B, Critique) and 110-140% Haiku.

    If I had the VRAM, I’d switch to local Qwen3 next (which almost 90% of Opus 4.5 on SWE Bench) and just git gud. Probably I’ll just look at smaller models, API calls and the git gud part.

    RTX 3060 (probably what you need for decent Qwen 3 next) is $1500 here :(

    For that much $$$ I can probably get 5 years of surgical API calls via OR + actual skills.

    PS: how are you using batch processing? How did you set it up?



  • Ah but subscription to OpenAI ChatGPT ($20/USD) gives you access to ChatGPT 5.3 codex bundled in, with some really generous usage allowances (well, compared to Claude)

    I haven’t looked recently, but API calls to Codex 5.2 via OR were silly expensive per million tokens; I can’t imagine 5.3 is any cheaper.

    To be fair to your point: I doubt many people sign up specifically for this (let’s say 20% if were making up numbers). Its still a good deal though. I can chew thru 30 million tokens in pretty much a day when I’m going hammer at tongs at stuff.

    Frankly, I don’t understand how OAI remain solvent. They’re eating a lot of shit in their “undercut the competition to take over the market” phase. But hey, if they’re giving it away, sure, I’ll take it.


  • Let’s be fair - not all of the masses are so ignorant.

    If you consider API vs subscription, you probably get more bang for buck out of paying $20/USD than just paying per million tokens via API calls. At least for OAI models. It’s legitimately a good deal for heavy users.

    For simipler stuff and/or if you have decent hardware? For sure - go local. Qwen3-4B 2507 instruct matches or surpasses ChatGPT 4.1 nano and mini on almost all benchmarks…and you can run it on your phone. I know because it (or the ablit version) is my go to at home. Its stupidily strong for a 4B.

    But if you need SOTA (or near to) and are rocking typical consumer grade hardware, then $20/month for basically unlimited tokens is the reason for subscription.





  • I really like Claude, but the way that it chews thru tokens def cements it as a “rich man’s” AI. Codex surprised me at how capable it is vs how much (little) it costs to operate. Previously, I’d been trying to use ChatGPT + web + project containers…with really sub-par refactoring results.

    Tbf, I’ve only really used Claude Opus 4.5 and GPT Codex5.3 for code, so pardon my ignorance.

    How well do open weight models like Kimi et al stack up? Can I call them via VsCodium to reason over local mirror of files on my repo? I’m hardware bound with limited compute. I’ve played around a bit with Open Router before, so have passing familiarity with things like TNG Deepseek R1T2, mimo-v2-flash etc.