Auto commit from /home/doc/genesis-tools
This commit is contained in:
parent
475284c1ee
commit
046d5204ea
14
OPS.md
14
OPS.md
@ -16,12 +16,12 @@ When an alert fires (Critical or Warning), this guide tells you what to do so th
|
|||||||
|
|
||||||
| Type of Alert | Emoji | What it Means | Immediate Action |
|
| Type of Alert | Emoji | What it Means | Immediate Action |
|
||||||
|:---|:---|:---|:---|
|
|:---|:---|:---|:---|
|
||||||
| Critical Service Failure | 🔚 | A key service (like Mastodon, MinIO) is **down** | SSH into the server, try `systemctl restart <service>`. |
|
| [Critical Service Failure](#critical-service-failure-) | 🔚 | A key service (like Mastodon, MinIO) is **down** | SSH into the server, try `systemctl restart <service>`. | A key service (like Mastodon, MinIO) is **down** | SSH into the server, try `systemctl restart <service>`. |
|
||||||
| Disk Filling Up | 📈 | Disk space critically low (under 10%) | SSH in and delete old logs/backups. Free up space **immediately**. |
|
| [Disk Filling Up](#disk-filling-up-) | 📈 | Disk space critically low (under 10%) | SSH in and delete old logs/backups. Free up space **immediately**. | Disk space critically low (under 10%) | SSH in and delete old logs/backups. Free up space **immediately**. |
|
||||||
| Rclone Mount Error | 🐢 | Cache failed, mount not healthy | Restart the rclone mount process. (Usually a `systemctl restart rclone@<mount>`, or remount manually.) |
|
| [Rclone Mount Error](#rclone-mount-error-) | 🐢 | Cache failed, mount not healthy | Restart the rclone mount process. (Usually a `systemctl restart rclone@<mount>`, or remount manually.) | Cache failed, mount not healthy | Restart the rclone mount process. (Usually a `systemctl restart rclone@<mount>`, or remount manually.) |
|
||||||
| PostgreSQL Replication Lag | 💥 | Database replicas are falling behind | Check database health. Restart replication if needed. Alert admin if lag is >5 minutes. |
|
| [PostgreSQL Replication Lag](#postgresql-replication-lag-) | 💥 | Database replicas are falling behind | Check database health. Restart replication if needed. Alert admin if lag is >5 minutes. | Database replicas are falling behind | Check database health. Restart replication if needed. Alert admin if lag is >5 minutes. |
|
||||||
| RAID Degraded | 🧸 | RAID array is degraded (missing a disk) | Open server console. Identify failed drive. Replace drive if possible. Otherwise escalate immediately. |
|
| [RAID Degraded](#raid-degraded-) | 🧸 | RAID array is degraded (missing a disk) | Open server console. Identify failed drive. Replace drive if possible. Otherwise escalate immediately. | RAID array is degraded (missing a disk) | Open server console. Identify failed drive. Replace drive if possible. Otherwise escalate immediately. |
|
||||||
| Log File Warnings | ⚠️ | Error patterns found in logs | Investigate. If system is healthy, **log it for later**. If errors worsen, escalate. |
|
| [Log File Warnings](#log-file-warnings-) | ⚠️ | Error patterns found in logs | Investigate. If system is healthy, **log it for later**. If errors worsen, escalate. | Error patterns found in logs | Investigate. If system is healthy, **log it for later**. If errors worsen, escalate. |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -35,7 +35,7 @@ When an alert fires (Critical or Warning), this guide tells you what to do so th
|
|||||||
## 🛡️ Emergency Contacts
|
## 🛡️ Emergency Contacts
|
||||||
| Role | Name | Contact |
|
| Role | Name | Contact |
|
||||||
|:----|:-----|:--------|
|
|:----|:-----|:--------|
|
||||||
| Primary Admin | (You) | [YOUR CONTACT INFO] |
|
| Primary Admin | (You) | [845-453-0820] |
|
||||||
| Secondary | Brice | [BRICE CONTACT INFO] |
|
| Secondary | Brice | [BRICE CONTACT INFO] |
|
||||||
|
|
||||||
(Replace placeholders with actual contact details.)
|
(Replace placeholders with actual contact details.)
|
||||||
|
Loading…
x
Reference in New Issue
Block a user