Automatically restarting hung or crashed Pleroma using Monit

Overview of the issue

If you've ran a Pleroma instance for mora than a few months you know how much it loves shitting itself. 500 internal server errors are so common that they're kind of a meme already but they're annoying at worst and that's not what we'll be solving here.What we'll be solving here is a much more serious issue, which is Pleroma service completely crashing. Basically from time to time (time ranging from days to months, on my instance it's currently every few days) pleroma service will just stop working. There's a bunch of generic "run Windows update" type advice ranging from tuning PostgreSQL and rate limiting incoming federation to updating Pleroma (lol) or even building Pleroma from source and debugging it yourself (lmao). Spoiler alert some of these might result in some marginal, barely noticeable improvements but most won't do shit (if they did you wouldn't be here). Anyway, the fastest and easiest solution is to just let it crash and restart it on crash and I'm about to show you how to do that.

A few notes about my setup

I'm running OTP build of stock Pleroma, 2.4.3 as of making of this post but due to the way this setup works it should work with any fork and build (Pleroma, Akkoma, Rebased, source or OTP, basically if you can restart it with sudo systemctl restart pleroma you're good to go)

Getting your machine ready

You might be thinking, just put Restart=on-failure in pleroma.service and systemd will handle the rest. Well guess what, it's already there. Why isn't Pleroma getting restarted then? Turns out, when Pleroma "crashes" it's not really crashing, it's more like locking up or something, so for systemd it appears as if the service is still running. So how do we know when Pleroma is crashed/stuck/whatever? Well, the same way you as a user know it: by monitoring the web service. If it's not returning 200 it's fucked, restart it. The simplest solutins truly are the best, aren't they?

What the fuck is Monit and how does it work?

As you might assume from the name, is a pretty cool tool that can be used for monitoring a bunch of shit. We'll be using it for monitoring and restarting Pleroma

How the fuck do you do it?

Installation

I'm on Ubuntu so I install it with sudo apt install monit. After installing run sudo systemctl enable monit to enable it to start at boot.

Setup

Monit config file is at /etc/monit/monitrc. Open it with whatever text editor. I'm using vim here beacuse I'm a masochist sudo vim /etc/monit/monitrc

The first option we're interested in is set daemon 120 and bellow it with start delay 120. This determines how often (in seconds) woll Monit run the checks. 120 seconds (2 minutes) is too short for our setup as Pleroma takes a few minutes to restart so you mightend up with a scenario where Pleroma is continuously getting restarted and it never get's the chance to fully start up. You'll probably want to experiment with that on your own machine, in my case I set both to 360 seconds (6 minutes), like set daemon 360 and with start delay 360. Just to be safe you might want to set it even higher, 10 minutes or whatever.

Second one is email config for notifications. Here's my setup, although my setup doesn't utilize these so I have no idea whether this works or not.

	set mailserver fedora.email port 587
        username [REDACTED] password "[REDACTED]"
	set alert shitpisscum@cock.li  #email address which will receive monit alerts
	

UPDATE: It does not work because fedora.email requires STARTTLS and I'm too lazy to figure it out rn. I'll leave this part commented out since I don't really care about email notifications.

Next is web interface. In set httpd port 1234 and replace the port or leave the default one or whatever, idk. Next, uncomment use address localhost to only allow connections from localhost or uncomment allow admin:monit to enable remote connection so you can open it in any web browser, obviously replace the default user and password. I tried this and it works but I'm not sure if I'd recommend enabling this.

Save the file and exit. Run sudo monit -t to test if you fucked something up, you should get "Control file syntax OK". If so run sudo monit reload

Now the fun part. Open the config file again sudo vim /etc/monit/monitrc. Add this at the end of the file:

	check host pleroma with address shitpisscum.mooo.com
	  if failed
		port 443
		protocol https
		request "/api/pleroma/healthcheck"
		status = 200
	  then exec "/bin/systemctl restart pleroma"
	

I'm requesting /api/pleroma/healthcheck but you can request /api/v1/instance or whatever endpoint, you dont't care about the content, as long as it returns 200 it's fine. In then exec "/bin/systemctl restart pleroma replace pleroma with akkoma, rebased or whatever the service is called. Save the file and exit. Run sudo monit -t to test if you fucked something up, you should get "Control file syntax OK". If so run sudo monit reload

That should be it. Run sudo monit status. You should see something like this:

	Remote Host 'pleroma'
	  status                       OK
	  monitoring status            Monitored
	  monitoring mode              active
	  on reboot                    start
	  port response time           70.407 ms to shitpisscum.mooo.com:443/api/pleroma/healthcheck type TCP/IP using TLS (certificate valid for 28 days) protocol HTTP
	  data collected               Wed, 22 Mar 2023 00:22:55
	

If so, congrats. You just fixed your Pleroma instance. You're welcome.


← Back to ShitPissCum Services home