• Semi-Hemi-Demigod@kbin.social
    link
    fedilink
    arrow-up
    177
    arrow-down
    5
    ·
    1 year ago

    Sysadmin pro tip: Keep a 1-10GB file of random data named DELETEME on your data drives. Then if this happens you can get some quick breathing room to fix things.

    Also, set up alerts for disk space.

      • nfh@lemmy.world
        link
        fedilink
        English
        arrow-up
        30
        ·
        1 year ago

        Why not both? Alerting to find issues quickly, a bit of extra storage so you have more options available in case of an outage, and maybe some redundancy for good measure.

        • RupeThereItIs@lemmy.world
          link
          fedilink
          English
          arrow-up
          14
          ·
          1 year ago

          A system this critical is on a SAN, if you’re properly alerting adding a bit more storage space is a 5 minute task.

          It should also have a DR solution, yes.

          • Nightwatch Admin@feddit.nl
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            A system this critical is on a hypervisor with tight storage “because deduplication” (I’m not making this up).

            • RupeThereItIs@lemmy.world
              link
              fedilink
              English
              arrow-up
              5
              ·
              1 year ago

              This is literally what I do for a living. Yes deduplication and thin provisioning.

              This is still a failure of monitoring or slow response to it.

              You keep your extra capacity handy on the storage array, not with some junk files on the filesystem.

              You also need to know how over provisioned you are and when you’re likely to run out of capacity… you know this from monitoring.

              Then when management fails to react promptly to your warnings. Shit like this happens.

              • Semi-Hemi-Demigod@kbin.social
                link
                fedilink
                arrow-up
                3
                ·
                1 year ago

                Then when management fails to react promptly to your warnings. Shit like this happens.

                The important part is that you have your warnings in writing, and BCC them to a personal email so you can cover your ass

      • Agent641@lemmy.world
        link
        fedilink
        English
        arrow-up
        19
        arrow-down
        1
        ·
        1 year ago

        Yes, alert me when disk space is about to run out so I can ask for a massive raise and quit my job when they dont give it to me.

        Then when TSHTF they pay me to come back.

      • ipkpjersi@lemmy.ml
        link
        fedilink
        English
        arrow-up
        15
        ·
        1 year ago

        A lot of companies have minimal alerting or no alerting at all. It’s kind of wild. I literally have better alerting in my home setup than many companies do lol

            • IonAddis@lemmy.world
              link
              fedilink
              English
              arrow-up
              4
              ·
              1 year ago

              I imagine it’s a case where if you’re knowledgeable, yeah it’s free. But if you have to hire people knowledgeable to implement the free solution, you still have to pay the people. And companies love to balk at that!

              • ipkpjersi@lemmy.ml
                link
                fedilink
                English
                arrow-up
                2
                ·
                1 year ago

                I think it’s that and any IT employees they have would not be allowed to work on it because they would be working on other stuff because companies wouldn’t prioritize that, since they don’t know how important it is until it’s too late.

      • looz
        link
        fedilink
        English
        arrow-up
        9
        ·
        1 year ago

        There’s cases where disk fills up quicker than one can reasonably react, even if alerts are in place. And sometimes culprit is something you can’t just go and kill.

    • dx1@lemmy.world
      link
      fedilink
      English
      arrow-up
      53
      ·
      1 year ago

      The real pro tip is to segregate the core system and anything on your system that eats up disk space into separate partitions, along with alerting, log rotation, etc. And also to not have a single point of failure in general. Hard to say exact what went wrong w/ Toyota but they probably could have planned better for it in a general way.

    • Lem453@lemmy.ca
      link
      fedilink
      English
      arrow-up
      31
      ·
      edit-2
      1 year ago

      Even better, cron job every 5 mins and if total remaining space falls to 5% auto delete the file and send a message to sys admin

      • Semi-Hemi-Demigod@kbin.social
        link
        fedilink
        arrow-up
        21
        ·
        1 year ago

        Sends a message and gets the services ready for potential shutdown. Or implements a rate limit to keep the service available but degraded.

      • bug@lemmy.one
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        At that point just set the limit a few gig higher and don’t have the decoy file at all

    • Maximilious@kbin.social
      link
      fedilink
      arrow-up
      29
      arrow-down
      1
      ·
      edit-2
      1 year ago

      10GB is nothing in an enterprise datastore housing PBs of data. 10GB is nothing for my 80TB homelab!

    • z00s@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      1 year ago

      Or make the file a little larger and wait until you’re up for a promotion…