76 lines
2.7 KiB
Markdown
76 lines
2.7 KiB
Markdown
|
# 🧾 Postmortem: Mastodon Object Storage Migration to Secure S3 (MinIO)
|
||
|
**Date:** April 30, 2025
|
||
|
**Engineer:** Doc (Genesis Radio / Genesis Hosting)
|
||
|
|
||
|
---
|
||
|
|
||
|
## 🎯 Objective
|
||
|
|
||
|
Migrate Mastodon's object storage from an older MinIO bucket (`linodeassets`) to a new **ZFS-backed, encrypted** MinIO instance (`mastodonassets-secure`) on `shredderv2`, while maintaining uptime and improving storage performance and security.
|
||
|
|
||
|
---
|
||
|
|
||
|
## 🧱 Infrastructure Touched
|
||
|
|
||
|
- Mastodon (Docker-based, hosted on Linode)
|
||
|
- MinIO S3 Object Storage (`oldminio` → `secureminio`)
|
||
|
- Nginx (reverse proxy for Console + S3 endpoints)
|
||
|
- ZFS pool: `nexus/mastodonassets`
|
||
|
- Domains:
|
||
|
- `shredderv2.sshjunkie.com` (S3 API)
|
||
|
- `consolev2.sshjunkie.com` (MinIO Console UI)
|
||
|
|
||
|
---
|
||
|
|
||
|
## ⚠️ Issues Encountered
|
||
|
|
||
|
1. **403 Access Denied on Mastodon startup**
|
||
|
- ✅ Root cause: `genesisadminv2` MinIO user had no attached policy
|
||
|
- 🔧 Fixed via Console UI after re-enabling access
|
||
|
|
||
|
2. **MinIO Console unreachable (`consolev2.sshjunkie.com`)**
|
||
|
- SSL cert for the domain was missing
|
||
|
- 🔧 Used `certbot certonly --standalone` to issue new cert, re-enabled full HTTPS proxy
|
||
|
|
||
|
3. **Sync race conditions**
|
||
|
- Some media files were uploaded to the old bucket during the long transfer
|
||
|
- 🔧 Mitigated by running an additional `rclone sync` pass before cutover
|
||
|
|
||
|
4. **Rclone performance bottlenecks**
|
||
|
- MinIO client (`mc mirror`) was too slow
|
||
|
- ✅ Switched to `rclone`, saw drastic speed improvement
|
||
|
|
||
|
5. **SPL (StationPlaylist) freezing during asset access**
|
||
|
- Root cause: cache choking on sparse file writes under ext4
|
||
|
- ✅ Fix: moved critical rclone mounts to ZFS-backed drives
|
||
|
|
||
|
---
|
||
|
|
||
|
## ✅ Success Criteria Met
|
||
|
|
||
|
- 🔒 All Mastodon assets are now stored in `mastodonassets-secure` with encryption
|
||
|
- 🪣 MinIO Console functional on `https://consolev2.sshjunkie.com`
|
||
|
- 🎯 Mastodon is running with zero visible user impact
|
||
|
- 💾 Snapshot (`nexus/mastodonassets@pre-s3-switch`) taken post-migration for rollback
|
||
|
- 🔁 Future syncs can now be performed cleanly from backup server instead of live system
|
||
|
|
||
|
---
|
||
|
|
||
|
## 🧠 Lessons Learned
|
||
|
|
||
|
- Always validate MinIO user policies before go-live
|
||
|
- Avoid redirects in `server_name` blocks during cert issuance
|
||
|
- ZFS dramatically improves caching performance with rclone VFS
|
||
|
- Post-cutover syncs are crucial for active upload systems like Mastodon
|
||
|
- UI access to MinIO is a lifesaver for emergency fixes — keep it working
|
||
|
|
||
|
---
|
||
|
|
||
|
## 🔚 Follow-Up Actions
|
||
|
|
||
|
- [ ] Schedule `certbot renew --standalone` with systemd timer
|
||
|
- [ ] Rotate MinIO user keys and audit access policies
|
||
|
- [ ] Monitor `/var/log/syslog` for VFS or sparse file errors
|
||
|
- [ ] Document your rclone mount and caching strategy for SPL and Mastodon
|
||
|
|