Table of Contents

🛠️ How to Use
📋 Quick Response Table
💻 If Dashboard Shows
🛡️ Emergency Contacts
✍️ Example Cheat Sheet for Brice

🌟 TL;DR
🛠️ Genesis Radio - Detailed Ops Playbook

Critical Service Failure (🔚)
Disk Filling Up (📈)
Rclone Mount Error (🐢)
PostgreSQL Replication Lag (💥)
RAID Degraded (🧸)
Log File Warnings (⚠️)

When an alert fires (Critical or Warning), this guide tells you what to do so that any team member can react quickly, even if the admin is not available.

🛠️ How to Use

Every Mastodon DM or Dashboard alert gives you a timestamp, server name, and issue.
Look up the type of issue in the table below.
Follow the recommended action immediately.

📋 Quick Response Table

Type of Alert	Emoji	What it Means	Immediate Action
Critical Service Failure	🔚	A key service (like Mastodon, MinIO) is down	SSH into the server, try `systemctl restart <service>`.
Disk Filling Up	📈	Disk space critically low (under 10%)	SSH in and delete old logs/backups. Free up space immediately.
Rclone Mount Error	🐢	Cache failed, mount not healthy	Restart the rclone mount process. (Usually a `systemctl restart rclone@<mount>`, or remount manually.)
PostgreSQL Replication Lag	💥	Database replicas are falling behind	Check database health. Restart replication if needed. Alert admin if lag is >5 minutes.
RAID Degraded	🧸	RAID array is degraded (missing a disk)	Open server console. Identify failed drive. Replace drive if possible. Otherwise escalate immediately.
Log File Warnings	⚠️	Error patterns found in logs	Investigate. If system is healthy, log it for later. If errors worsen, escalate.

💻 If Dashboard Shows

✅ All Green = No action needed.
⚠️ Warnings = Investigate soon. Not urgent unless repeated.
🚨 Criticals = Drop everything and act immediately.

🛡️ Emergency Contacts

Role	Name	Contact
Primary Admin	(You)	[845-453-0820]
Secondary	Brice	[BRICE CONTACT INFO]

(Replace placeholders with actual contact details.)

✍️ Example Cheat Sheet for Brice

Sample Mastodon DM:

🚨 Genesis Radio Critical Healthcheck 2025-04-28 14:22:33 🚨
⚡ 1 critical issue found:

🔚 [mastodon] CRITICAL: Service mastodon-web not running!

Brice should:

SSH into Mastodon server.
Run systemctl restart mastodon-web.
Confirm the service is running again.
If it fails or stays down, escalate to admin.

🌟 TL;DR

🚨 Criticals: Act immediately.
⚠️ Warnings: Investigate soon.
✅ Healthy: No action needed.

🛠️ Genesis Radio - Detailed Ops Playbook

Critical Service Failure (🔚)

Symptoms: Service marked as CRITICAL.

Fix:

SSH into server.
sudo systemctl status <service>
sudo systemctl restart <service>
Confirm running. Check logs if it fails.

Disk Filling Up (📈)

Symptoms: Disk space critically low.

Fix:

SSH into server.
df -h

Delete old logs:

sudo rm -rf /var/log/*.gz /var/log/*.[0-9]
sudo journalctl --vacuum-time=2d

If still low, find big files and clean.

Rclone Mount Error (🐢)

Symptoms: Mount failure or slowness.

Fix:

SSH into SPL server.

Unmount & remount:

sudo fusermount -uz /path/to/mount
sudo systemctl restart rclone@<mount>

Confirm mount is active.

PostgreSQL Replication Lag (💥)

Symptoms: Replica database lagging.

Fix:

SSH into replica server.

Check lag:

sudo -u postgres psql -c "SELECT * FROM pg_stat_replication;"

Restart PostgreSQL if stuck.
Monitor replication logs.

RAID Degraded (🧸)

Symptoms: RAID missing a disk.

Fix:

SSH into server.
cat /proc/mdstat
Find failed drive:
```
sudo mdadm --detail /dev/md0
```
Replace failed disk, rebuild array:
```
sudo mdadm --add /dev/md0 /dev/sdX
```

Log File Warnings (⚠️)

Symptoms: Errors in syslog or nginx.

Fix:

SSH into server.
Review logs:
```
grep ERROR /var/log/syslog
```
Investigate. Escalate if necessary.

Stay sharp. Early fixes prevent major downtime! 🛡️💪