I’ve been running an HPC system for a science group for a while now and have built a couple of different systems based on common HPC infrastructures (ROCKS or Open HPC). These have been built on top of the rebuilt RHEL distros (mostly CentOS), but I don’t really need the level of stability that these provide and would actually like the sort of updates that you get from something like CentOS stream, so this seems like a time to try this.

The problem is that I haven’t found an HPC framework which would natively support this so I’m potentially going to have to roll my own. I don’t need anything fancy just some way to automatically deploy nodes and set up slurm to get jobs queued.

Any pointers to suitable frameworks or tools which would help with this and which aren’t tied to older distros?

  • xylan@kbin.socialOP
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    The lack of stability is actually quite attractive to me. In a scientific environment we’re normally running fairly new, often unstable code, and we often hit problems because of using older versions of libraries / packages / compilers, so somthing which stays a bit more current would be good and we can deal with breakage if it happens. The trouble is the management systems around HPC assume you’re working on enterprise systems, which isn’t really true in our case.

    I’ve looked at things like OpenHPC but they’re still on RHEL8 (RHEL9 is in testing but not released yet), and even lower level tools like warewulf is still only supporting RHEL8 at the moment which is getting too old for me to want to build a new system from it.

    I’ve looked at more generic tools like Ansible and Chef / Puppet but before I go down that rabbit hole I’d like a sanity check that there isn’t something more suited that I’m missing.