2.5 KiB
2.5 KiB
🚀 Genesis Radio - Healthcheck Response Runbook
Purpose
When an alert fires (Critical or Warning), this guide tells you what to do so that anyone can react quickly, even if the admin is not available.
🛠️ How to Use
- Every Mastodon DM or Dashboard alert gives you a timestamp, server name, and issue.
- Look up the type of issue in the table below.
- Follow the recommended action immediately.
📋 Quick Response Table
Type of Alert | Emoji | What it Means | Immediate Action |
---|---|---|---|
Critical Service Failure | 🔚 | A key service (like Mastodon, MinIO) is down | SSH into the server, try systemctl restart <service> . |
Disk Filling Up | 📈 | Disk space critically low (under 10%) | SSH in and delete old logs/backups. Free up space immediately. |
Rclone Mount Error | 🐢 | Cache failed, mount not healthy | Restart the rclone mount process. (Usually a systemctl restart rclone@<mount> , or remount manually.) |
PostgreSQL Replication Lag | 💥 | Database replicas are falling behind | Check database health. Restart replication if needed. Alert admin if lag is >5 minutes. |
RAID Degraded | 🧸 | RAID array is degraded (missing a disk) | Open server console. Identify failed drive. Replace drive if possible. Otherwise escalate immediately. |
Log File Warnings | ⚠️ | Error patterns found in logs | Investigate. If system is healthy, log it for later. If errors worsen, escalate. |
💻 If Dashboard Shows
- ✅ All Green = No action needed.
- ⚠️ Warnings = Investigate soon. Not urgent unless repeated.
- 🚨 Criticals = Drop everything and act immediately.
🛡️ Emergency Contacts
Role | Name | Contact |
---|---|---|
Primary Admin | (You) | 845-453-0820 |
Secondary | Brice | CONTACT INFO HERE? |
(Replace placeholders with actual contact details.)
✍️ Example Cheat Sheet
Sample Mastodon DM:
🚨 Genesis Radio Critical Healthcheck 2025-04-28 14:22:33 🚨
⚡ 1 critical issue found:
- 🔚 [mastodon] CRITICAL: Service mastodon-web not running!
Brice should:
- SSH into Mastodon server.
- Run
systemctl restart mastodon-web
. - Confirm the service is running again.
- If it fails or stays down, escalate to admin.
🌟 TL;DR
- 🚨 Criticals: Act immediately.
- ⚠️ Warnings: Investigate soon.
- ✅ Healthy: No action needed.
Stay sharp. Our uptime and service quality depend on quick, calm responses! 🛡️💪