2.7 KiB
2.7 KiB
🧾 Postmortem: Mastodon Object Storage Migration to Secure S3 (MinIO)
Date: April 30, 2025
Engineer: Doc (Genesis Radio / Genesis Hosting)
🎯 Objective
Migrate Mastodon's object storage from an older MinIO bucket (linodeassets
) to a new ZFS-backed, encrypted MinIO instance (mastodonassets-secure
) on shredderv2
, while maintaining uptime and improving storage performance and security.
🧱 Infrastructure Touched
- Mastodon (Docker-based, hosted on Linode)
- MinIO S3 Object Storage (
oldminio
→secureminio
) - Nginx (reverse proxy for Console + S3 endpoints)
- ZFS pool:
nexus/mastodonassets
- Domains:
shredderv2.sshjunkie.com
(S3 API)consolev2.sshjunkie.com
(MinIO Console UI)
⚠️ Issues Encountered
-
403 Access Denied on Mastodon startup
- ✅ Root cause:
genesisadminv2
MinIO user had no attached policy - 🔧 Fixed via Console UI after re-enabling access
- ✅ Root cause:
-
MinIO Console unreachable (
consolev2.sshjunkie.com
)- SSL cert for the domain was missing
- 🔧 Used
certbot certonly --standalone
to issue new cert, re-enabled full HTTPS proxy
-
Sync race conditions
- Some media files were uploaded to the old bucket during the long transfer
- 🔧 Mitigated by running an additional
rclone sync
pass before cutover
-
Rclone performance bottlenecks
- MinIO client (
mc mirror
) was too slow - ✅ Switched to
rclone
, saw drastic speed improvement
- MinIO client (
-
SPL (StationPlaylist) freezing during asset access
- Root cause: cache choking on sparse file writes under ext4
- ✅ Fix: moved critical rclone mounts to ZFS-backed drives
✅ Success Criteria Met
- 🔒 All Mastodon assets are now stored in
mastodonassets-secure
with encryption - 🪣 MinIO Console functional on
https://consolev2.sshjunkie.com
- 🎯 Mastodon is running with zero visible user impact
- 💾 Snapshot (
nexus/mastodonassets@pre-s3-switch
) taken post-migration for rollback - 🔁 Future syncs can now be performed cleanly from backup server instead of live system
🧠 Lessons Learned
- Always validate MinIO user policies before go-live
- Avoid redirects in
server_name
blocks during cert issuance - ZFS dramatically improves caching performance with rclone VFS
- Post-cutover syncs are crucial for active upload systems like Mastodon
- UI access to MinIO is a lifesaver for emergency fixes — keep it working
🔚 Follow-Up Actions
- Schedule
certbot renew --standalone
with systemd timer - Rotate MinIO user keys and audit access policies
- Monitor
/var/log/syslog
for VFS or sparse file errors - Document your rclone mount and caching strategy for SPL and Mastodon