Auto-commit from giteapush.sh at 2025-04-29 09:54:38
This commit is contained in:
parent
ab6080af65
commit
f13961f1b5
2
infra_morale/.env
Normal file
2
infra_morale/.env
Normal file
@ -0,0 +1,2 @@
|
|||||||
|
MASTODON_BASE_URL=https://chatwithus.live
|
||||||
|
MASTODON_TOKEN=Txh2DlBI7hgly8e7zsu0Pee2ONJcFAxMpbyiXTvsZKw
|
5
infra_morale/Gemfile
Normal file
5
infra_morale/Gemfile
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
source 'https://rubygems.org'
|
||||||
|
|
||||||
|
gem 'mastodon-api', '~> 2.0'
|
||||||
|
gem 'dotenv'
|
||||||
|
gem 'http', '~> 3.3.0'
|
40
infra_morale/Gemfile.lock
Normal file
40
infra_morale/Gemfile.lock
Normal file
@ -0,0 +1,40 @@
|
|||||||
|
GEM
|
||||||
|
remote: https://rubygems.org/
|
||||||
|
specs:
|
||||||
|
addressable (2.8.7)
|
||||||
|
public_suffix (>= 2.0.2, < 7.0)
|
||||||
|
bigdecimal (3.1.9)
|
||||||
|
buftok (0.3.0)
|
||||||
|
domain_name (0.6.20240107)
|
||||||
|
dotenv (3.1.8)
|
||||||
|
http (3.3.0)
|
||||||
|
addressable (~> 2.3)
|
||||||
|
http-cookie (~> 1.0)
|
||||||
|
http-form_data (~> 2.0)
|
||||||
|
http_parser.rb (~> 0.6.0)
|
||||||
|
http-cookie (1.0.8)
|
||||||
|
domain_name (~> 0.5)
|
||||||
|
http-form_data (2.3.0)
|
||||||
|
http_parser.rb (0.6.0)
|
||||||
|
mastodon-api (2.0.0)
|
||||||
|
addressable (~> 2.6)
|
||||||
|
buftok (~> 0)
|
||||||
|
http (~> 3.3)
|
||||||
|
oj (~> 3.7)
|
||||||
|
oj (3.16.10)
|
||||||
|
bigdecimal (>= 3.0)
|
||||||
|
ostruct (>= 0.2)
|
||||||
|
ostruct (0.6.1)
|
||||||
|
public_suffix (6.0.1)
|
||||||
|
|
||||||
|
PLATFORMS
|
||||||
|
ruby
|
||||||
|
x86_64-linux-gnu
|
||||||
|
|
||||||
|
DEPENDENCIES
|
||||||
|
dotenv
|
||||||
|
http (~> 3.3.0)
|
||||||
|
mastodon-api (~> 2.0)
|
||||||
|
|
||||||
|
BUNDLED WITH
|
||||||
|
2.6.8
|
60
infra_morale/fake_status_bot.rb
Normal file
60
infra_morale/fake_status_bot.rb
Normal file
@ -0,0 +1,60 @@
|
|||||||
|
#!/usr/bin/env ruby
|
||||||
|
|
||||||
|
require 'mastodon'
|
||||||
|
require 'dotenv/load'
|
||||||
|
|
||||||
|
# === Config ===
|
||||||
|
BASE_URL = ENV['MASTODON_BASE_URL'] || 'https://chatwithus.live'
|
||||||
|
BEARER_TOKEN = ENV['MASTODON_TOKEN'] # Token for @administration
|
||||||
|
MENTION_TARGET = '@doctator'
|
||||||
|
VISIBILITY = 'public'
|
||||||
|
|
||||||
|
# === Message Pool ===
|
||||||
|
MESSAGES = [
|
||||||
|
"#{MENTION_TARGET} just quietly restored PITR to a fresh replica and didn’t even break a sweat. Absolute legend. 🧠🔧",
|
||||||
|
"Redis is stable. WALs are flowing. #{MENTION_TARGET}, you are appreciated.",
|
||||||
|
"Zero downtime. Zero drama. All hail the ops warlock #{MENTION_TARGET}.",
|
||||||
|
"If you’re using Genesis and it hasn’t exploded, thank #{MENTION_TARGET}.",
|
||||||
|
"PostgreSQL didn’t crash today. That’s because #{MENTION_TARGET} made it scared.",
|
||||||
|
"#{MENTION_TARGET} has tamed more YAML demons than most people have configs.",
|
||||||
|
"Krang sleeps peacefully tonight. Thanks, #{MENTION_TARGET}.",
|
||||||
|
"99.999% uptime and exactly 0 thanks. Not anymore. Props to #{MENTION_TARGET}.",
|
||||||
|
"#{MENTION_TARGET} once replicated a database just by looking at it.",
|
||||||
|
"Mastodon’s running smooth. We all know why: #{MENTION_TARGET} did a thing again.",
|
||||||
|
"Do backups love you? No. But they love #{MENTION_TARGET}.",
|
||||||
|
"The firewall obeys only one voice. #{MENTION_TARGET}'s.",
|
||||||
|
"Ansible didn’t throw a fit. Clearly #{MENTION_TARGET} touched something gently.",
|
||||||
|
"You ever seen HAProxy smile? No? Ask #{MENTION_TARGET}.",
|
||||||
|
"Every log tail whispers: 'thank you #{MENTION_TARGET}.'",
|
||||||
|
"#{MENTION_TARGET} fixed the thing. Which thing? Doesn’t matter. It’s all working now.",
|
||||||
|
"Nothing’s down. Brice hasn’t touched anything. #{MENTION_TARGET} must be watching.",
|
||||||
|
"Legend has it #{MENTION_TARGET} once did a hotfix *during a power outage* using only curl and willpower.",
|
||||||
|
"Genesis Shield stands. #{MENTION_TARGET} stands behind it.",
|
||||||
|
"Disk I/O is quiet tonight. The system is at peace. Thanks #{MENTION_TARGET}.",
|
||||||
|
"The only person who fears nothing on this network is #{MENTION_TARGET}.",
|
||||||
|
"Your nightly crontab runs because #{MENTION_TARGET} blessed it with uptime.",
|
||||||
|
"Some heroes wear capes. Others write cronjobs. #{MENTION_TARGET} does both.",
|
||||||
|
"7 VMs, 3 clusters, 1 human. Respect to #{MENTION_TARGET}.",
|
||||||
|
"When the ops team panics, they call #{MENTION_TARGET}. When #{MENTION_TARGET} panics, they just don’t.",
|
||||||
|
"#{MENTION_TARGET} is why Mastodon still has friends.",
|
||||||
|
"That fail2ban alert? Already handled. Guess who? #{MENTION_TARGET}.",
|
||||||
|
"If uptime were a sport, #{MENTION_TARGET} would be banned for doping. With caffeine.",
|
||||||
|
"Don’t worry about the RAID sync. #{MENTION_TARGET} already knows it finished.",
|
||||||
|
"You think that voicebot’s working by luck? No. #{MENTION_TARGET} wired it to the stars.",
|
||||||
|
"Sometimes the bot posts these messages just so #{MENTION_TARGET} doesn’t feel so alone. ❤️",
|
||||||
|
"One of these messages is fake. The rest are true. #{MENTION_TARGET} knows which.",
|
||||||
|
"The system saw Brice try to log in. #{MENTION_TARGET} blocked him before his password hit the wire.",
|
||||||
|
"Today’s performance? 100%. Thanks to #{MENTION_TARGET} and a barely-contained caffeine dependency.",
|
||||||
|
"If Genesis Radio ever goes silent, it means #{MENTION_TARGET} finally took a nap.",
|
||||||
|
"There are 10 types of people: those who understand binary, and #{MENTION_TARGET}, who speaks it fluently.",
|
||||||
|
"#{MENTION_TARGET} once PITR’d a VM while live-mixing a Genesis special. We were there. We saw it."
|
||||||
|
]
|
||||||
|
|
||||||
|
# === Compose Toot ===
|
||||||
|
status = "STATUS UPDATE: #{MESSAGES.sample}"
|
||||||
|
|
||||||
|
# === Post ===
|
||||||
|
client = Mastodon::REST::Client.new(base_url: BASE_URL, bearer_token: BEARER_TOKEN)
|
||||||
|
client.create_status(status, visibility: VISIBILITY)
|
||||||
|
|
||||||
|
puts "Tooted: #{status}"
|
1
log/infra_morale.log
Normal file
1
log/infra_morale.log
Normal file
@ -0,0 +1 @@
|
|||||||
|
/bin/sh: 1: /home/doc/genesis-tools/infra_morale/run_morale.sh: not found
|
95
miscellaneous/bash/genesis_healthcheck.sh
Executable file
95
miscellaneous/bash/genesis_healthcheck.sh
Executable file
@ -0,0 +1,95 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# === CONFIG ===
|
||||||
|
LOG_PATH="$HOME/.genesis_healthcheck"
|
||||||
|
ERROR_CACHE="$LOG_PATH/transient_errors.log"
|
||||||
|
STATUS_OUT="$LOG_PATH/status_report.txt"
|
||||||
|
MAX_RETRIES=3
|
||||||
|
RETRY_INTERVAL=3
|
||||||
|
TRANSIENT_TTL=3 # Number of cycles to tolerate transient failures
|
||||||
|
mkdir -p "$LOG_PATH"
|
||||||
|
touch "$ERROR_CACHE"
|
||||||
|
touch "$STATUS_OUT"
|
||||||
|
|
||||||
|
# === FUNCTIONS ===
|
||||||
|
log() {
|
||||||
|
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$STATUS_OUT"
|
||||||
|
}
|
||||||
|
|
||||||
|
check_ssh() {
|
||||||
|
local host=$1
|
||||||
|
for i in $(seq 1 $MAX_RETRIES); do
|
||||||
|
ssh -o BatchMode=yes -o ConnectTimeout=3 -o UseDNS=no "$host" 'echo OK' 2>/dev/null && return 0
|
||||||
|
sleep $RETRY_INTERVAL
|
||||||
|
done
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
check_logs_for_errors() {
|
||||||
|
local host=$1
|
||||||
|
local file=$2
|
||||||
|
ssh -o BatchMode=yes -o ConnectTimeout=3 -o UseDNS=no "$host" "grep -i 'error' $file | tail -n 10" 2>/dev/null
|
||||||
|
}
|
||||||
|
|
||||||
|
increment_error_state() {
|
||||||
|
local key="$1"
|
||||||
|
grep -q "$key" "$ERROR_CACHE" && \
|
||||||
|
sed -i "/^$key/c\$key $(($(grep "$key" "$ERROR_CACHE" | cut -d ' ' -f 2)+1))" "$ERROR_CACHE" || \
|
||||||
|
echo "$key 1" >> "$ERROR_CACHE"
|
||||||
|
}
|
||||||
|
|
||||||
|
reset_error_state() {
|
||||||
|
grep -v "^$1" "$ERROR_CACHE" > "$ERROR_CACHE.tmp" && mv "$ERROR_CACHE.tmp" "$ERROR_CACHE"
|
||||||
|
}
|
||||||
|
|
||||||
|
is_persistent_failure() {
|
||||||
|
local key="$1"
|
||||||
|
local count=$(grep "$key" "$ERROR_CACHE" | awk '{print $2}')
|
||||||
|
[[ "$count" -ge $TRANSIENT_TTL ]] && return 0 || return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# === BEGIN ===
|
||||||
|
echo "Genesis Radio Healthcheck - $(date)" > "$STATUS_OUT"
|
||||||
|
ALL_OK=true
|
||||||
|
HOSTS=(shredder mastodon db1 db2)
|
||||||
|
|
||||||
|
for host in "${HOSTS[@]}"; do
|
||||||
|
echo -n "Checking $host... "
|
||||||
|
|
||||||
|
if ! check_ssh "$host"; then
|
||||||
|
increment_error_state "$host-ssh"
|
||||||
|
if is_persistent_failure "$host-ssh"; then
|
||||||
|
log "❌ [$host] SSH unreachable after $MAX_RETRIES attempts."
|
||||||
|
ALL_OK=false
|
||||||
|
else
|
||||||
|
log "⚠️ [$host] TRANSIENT: SSH unreachable (will retry next cycle)."
|
||||||
|
fi
|
||||||
|
continue
|
||||||
|
else
|
||||||
|
reset_error_state "$host-ssh"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Sample: check logs for real errors
|
||||||
|
if output=$(check_logs_for_errors "$host" "/var/log/syslog"); then
|
||||||
|
if [[ -n "$output" ]]; then
|
||||||
|
log "⚠️ [$host] Errors found in syslog:\n$output"
|
||||||
|
increment_error_state "$host-logs"
|
||||||
|
if is_persistent_failure "$host-logs"; then
|
||||||
|
log "❌ [$host] Persistent syslog errors."
|
||||||
|
ALL_OK=false
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
reset_error_state "$host-logs"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
done
|
||||||
|
|
||||||
|
if $ALL_OK; then
|
||||||
|
log "✅ All systems nominal."
|
||||||
|
else
|
||||||
|
log "⚠️ Some systems reporting persistent warnings or failures."
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Optional: Send to Mastodon, Telegram, email, etc.
|
||||||
|
# cat "$STATUS_OUT"
|
115
miscellaneous/bash/pitr_staging_restore.sh
Executable file
115
miscellaneous/bash/pitr_staging_restore.sh
Executable file
@ -0,0 +1,115 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# === CONFIG ===
|
||||||
|
REMOTE_USER="doc"
|
||||||
|
REMOTE_PGDATA="/var/lib/postgresql/16/main"
|
||||||
|
REMOTE_BASE="/var/lib/postgresql/16/base_restore"
|
||||||
|
REMOTE_WAL="/var/lib/postgresql/16/wal_archive"
|
||||||
|
BACKUP_SERVER="backup.sshjunkie.com"
|
||||||
|
BACKUP_BASE="/mnt/backup/pgdumps/"
|
||||||
|
BACKUP_WAL="/mnt/backups/wal/"
|
||||||
|
TIMESTAMP_FILE="$HOME/genesis-tools/miscellaneous/pitr_reference_time.txt"
|
||||||
|
LOG_FILE="$HOME/pitr_logs/pitr_staging.log"
|
||||||
|
TIMESTAMP_NOW=$(date "+%Y-%m-%d %H:%M:%S")
|
||||||
|
|
||||||
|
mkdir -p "$(dirname "$LOG_FILE")"
|
||||||
|
|
||||||
|
# === Choose Target Node ===
|
||||||
|
if [[ "$1" == "--target" && -n "$2" ]]; then
|
||||||
|
if [[ "$2" == "db3" ]]; then
|
||||||
|
REMOTE_HOST="replica.db3.sshjunkie.com"
|
||||||
|
elif [[ "$2" == "db4" ]]; then
|
||||||
|
REMOTE_HOST="replica.db4.sshjunkie.com"
|
||||||
|
else
|
||||||
|
echo "❌ Invalid target. Use --target db3 or --target db4."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo "Usage: $0 --target [db3|db4]"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# === Validate timestamp file exists ===
|
||||||
|
if [[ ! -f "$TIMESTAMP_FILE" ]]; then
|
||||||
|
echo "❌ No reference timestamp found at $TIMESTAMP_FILE"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
RECOVERY_TIME=$(awk '{print $1 " " $2}' "$TIMESTAMP_FILE")
|
||||||
|
|
||||||
|
echo "[$TIMESTAMP_NOW] === PITR Restore to Staging Node: $REMOTE_HOST ===" | tee -a "$LOG_FILE"
|
||||||
|
|
||||||
|
# === Find Latest Base Backup Folder on Backup Server ===
|
||||||
|
LATEST_BACKUP_FOLDER=$(ssh "$BACKUP_SERVER" "ls -td $BACKUP_BASE*/ | head -n1" | xargs basename)
|
||||||
|
|
||||||
|
if [[ -z "$LATEST_BACKUP_FOLDER" ]]; then
|
||||||
|
echo "❌ No base backups found on backup server!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[*] Using latest base backup folder: $LATEST_BACKUP_FOLDER"
|
||||||
|
|
||||||
|
# === Check freshness of latest backup ===
|
||||||
|
LATEST_BACKUP_TIME=$(ssh "$BACKUP_SERVER" "stat -c %Y $BACKUP_BASE$LATEST_BACKUP_FOLDER")
|
||||||
|
CURRENT_TIME=$(date +%s)
|
||||||
|
AGE_HOURS=$(( (CURRENT_TIME - LATEST_BACKUP_TIME) / 3600 ))
|
||||||
|
|
||||||
|
if [[ "$AGE_HOURS" -gt 24 ]]; then
|
||||||
|
echo "⚠️ WARNING: Latest base backup is older than 24 hours! ($AGE_HOURS hours old)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# === Rsync Base Backup and WALs from Backup Server Directly to Replica ===
|
||||||
|
echo "[*] Rsyncing base backup directly to $REMOTE_HOST..." | tee -a "$LOG_FILE"
|
||||||
|
rsync -avz --delete "$BACKUP_SERVER:$BACKUP_BASE$LATEST_BACKUP_FOLDER/" "$REMOTE_USER@$REMOTE_HOST:$REMOTE_BASE/" >> "$LOG_FILE" 2>&1
|
||||||
|
|
||||||
|
echo "[*] Rsyncing WALs directly to $REMOTE_HOST..." | tee -a "$LOG_FILE"
|
||||||
|
rsync -avz --delete "$BACKUP_SERVER:$BACKUP_WAL" "$REMOTE_USER@$REMOTE_HOST:$REMOTE_WAL/" >> "$LOG_FILE" 2>&1
|
||||||
|
|
||||||
|
# === SSH into Replica Node and Perform Restore ===
|
||||||
|
ssh "$REMOTE_USER@$REMOTE_HOST" bash << EOF
|
||||||
|
set -e
|
||||||
|
|
||||||
|
echo "[*] Stopping PostgreSQL service..."
|
||||||
|
sudo -n systemctl stop postgresql@16-main
|
||||||
|
|
||||||
|
echo "[*] Cleaning old PGDATA directory..."
|
||||||
|
sudo -n rm -rf $REMOTE_PGDATA/*
|
||||||
|
sudo -n mkdir -p $REMOTE_PGDATA
|
||||||
|
sudo -n chown postgres:postgres $REMOTE_PGDATA
|
||||||
|
|
||||||
|
echo "[*] Copying base backup into PGDATA..."
|
||||||
|
sudo -n cp -a $REMOTE_BASE/* $REMOTE_PGDATA/
|
||||||
|
sudo -n chown -R postgres:postgres $REMOTE_PGDATA
|
||||||
|
|
||||||
|
echo "[*] Cleaning up leftover recovery artifacts..."
|
||||||
|
sudo -n rm -f $REMOTE_PGDATA/postmaster.pid $REMOTE_PGDATA/standby.signal $REMOTE_PGDATA/recovery.signal
|
||||||
|
|
||||||
|
echo "[*] Writing recovery configuration..."
|
||||||
|
sudo -n bash -c "cat > $REMOTE_PGDATA/postgresql.auto.conf << EOF2
|
||||||
|
restore_command = 'cp $REMOTE_WAL/%f %p'
|
||||||
|
recovery_target_time = '$RECOVERY_TIME'
|
||||||
|
EOF2"
|
||||||
|
sudo -n touch $REMOTE_PGDATA/recovery.signal
|
||||||
|
|
||||||
|
echo "[*] Starting PostgreSQL..."
|
||||||
|
sudo -n systemctl start postgresql@16-main
|
||||||
|
|
||||||
|
sleep 5
|
||||||
|
|
||||||
|
echo "[*] Checking recovery status..."
|
||||||
|
RECOVERY_STATE=\$(sudo -n -u postgres psql -U postgres -tAc "SELECT pg_is_in_recovery();")
|
||||||
|
|
||||||
|
if [[ "\$RECOVERY_STATE" == "f" ]]; then
|
||||||
|
echo "[✓] Database exited recovery mode successfully."
|
||||||
|
else
|
||||||
|
echo "[!] Still in recovery. Promoting..."
|
||||||
|
sudo -n systemctl restart postgresql@16-main
|
||||||
|
sleep 5
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[*] Final replay point after recovery:"
|
||||||
|
sudo -n -u postgres psql -U postgres -tAc "SELECT pg_last_wal_replay_lsn(), now();"
|
||||||
|
|
||||||
|
EOF
|
||||||
|
|
||||||
|
echo "[✓] PITR validation run complete for $REMOTE_HOST." | tee -a "$LOG_FILE"
|
86
miscellaneous/bash/redalert/pitr_full.sh
Executable file
86
miscellaneous/bash/redalert/pitr_full.sh
Executable file
@ -0,0 +1,86 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# === CONFIG ===
|
||||||
|
REPLICA_HOST="replica.db3.sshjunkie.com"
|
||||||
|
REPLICA_PORT=5432
|
||||||
|
REPLICA_USER="postgres"
|
||||||
|
|
||||||
|
BACKUP_SERVER="backup.sshjunkie.com"
|
||||||
|
BACKUP_DIR="/mnt/backup/pg_base"
|
||||||
|
WAL_DIR="/mnt/backup/wal"
|
||||||
|
|
||||||
|
REMOTE_USER="doc"
|
||||||
|
REMOTE_BASE="/var/lib/postgresql/16/base_restore"
|
||||||
|
REMOTE_WAL="/var/lib/postgresql/16/wal_archive"
|
||||||
|
REMOTE_PGDATA="/var/lib/postgresql/16/main"
|
||||||
|
REMOTE_HOST="replica.db3.sshjunkie.com"
|
||||||
|
|
||||||
|
TIMESTAMP=$(date +%F_%H%M%S)
|
||||||
|
TARGET_BASE="$BACKUP_DIR/$TIMESTAMP"
|
||||||
|
LOG_FILE="$HOME/pitr_logs/full_backup_and_pitr.log"
|
||||||
|
|
||||||
|
mkdir -p "$(dirname "$LOG_FILE")"
|
||||||
|
echo "[$(date '+%F %T')] === STARTING FULL BACKUP + PITR VALIDATION ===" | tee -a "$LOG_FILE"
|
||||||
|
|
||||||
|
# === STEP 1: Run pg_basebackup remotely on the backup server ===
|
||||||
|
echo "[*] Running pg_basebackup on $BACKUP_SERVER..." | tee -a "$LOG_FILE"
|
||||||
|
ssh "$BACKUP_SERVER" "mkdir -p '$TARGET_BASE' && pg_basebackup -h $REPLICA_HOST -p $REPLICA_PORT -U $REPLICA_USER -D '$TARGET_BASE' -Fp -Xs -P -R" >> "$LOG_FILE" 2>&1
|
||||||
|
|
||||||
|
if [[ $? -ne 0 ]]; then
|
||||||
|
echo "❌ pg_basebackup failed!" | tee -a "$LOG_FILE"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# === STEP 2: Rsync base backup to replica ===
|
||||||
|
echo "[*] Rsyncing base backup to $REMOTE_HOST..." | tee -a "$LOG_FILE"
|
||||||
|
rsync -avz --delete "$BACKUP_SERVER:$TARGET_BASE/" "$REMOTE_USER@$REMOTE_HOST:$REMOTE_BASE/" >> "$LOG_FILE" 2>&1
|
||||||
|
|
||||||
|
# === STEP 3: Rsync WALs to replica ===
|
||||||
|
echo "[*] Rsyncing WALs to $REMOTE_HOST..." | tee -a "$LOG_FILE"
|
||||||
|
rsync -avz --delete "$BACKUP_SERVER:$WAL_DIR/" "$REMOTE_USER@$REMOTE_HOST:$REMOTE_WAL/" >> "$LOG_FILE" 2>&1
|
||||||
|
|
||||||
|
# === STEP 4: SSH to replica and restore ===
|
||||||
|
echo "[*] Performing PITR on $REMOTE_HOST..." | tee -a "$LOG_FILE"
|
||||||
|
ssh "$REMOTE_USER@$REMOTE_HOST" bash << EOF
|
||||||
|
set -e
|
||||||
|
|
||||||
|
echo "[*] Stopping PostgreSQL..."
|
||||||
|
sudo -n systemctl stop postgresql@16-main
|
||||||
|
|
||||||
|
echo "[*] Cleaning PGDATA..."
|
||||||
|
sudo -n rm -rf $REMOTE_PGDATA/*
|
||||||
|
sudo -n mkdir -p $REMOTE_PGDATA
|
||||||
|
sudo -n chown postgres:postgres $REMOTE_PGDATA
|
||||||
|
|
||||||
|
echo "[*] Copying base backup..."
|
||||||
|
sudo -n cp -a $REMOTE_BASE/* $REMOTE_PGDATA/
|
||||||
|
sudo -n chown -R postgres:postgres $REMOTE_PGDATA
|
||||||
|
|
||||||
|
echo "[*] Removing stale recovery files..."
|
||||||
|
sudo -n rm -f $REMOTE_PGDATA/postmaster.pid $REMOTE_PGDATA/standby.signal $REMOTE_PGDATA/recovery.signal
|
||||||
|
|
||||||
|
echo "[*] Creating recovery config..."
|
||||||
|
sudo -n bash -c "cat > $REMOTE_PGDATA/postgresql.auto.conf << EOC
|
||||||
|
restore_command = 'cp $REMOTE_WAL/%f %p'
|
||||||
|
EOC"
|
||||||
|
sudo -n touch $REMOTE_PGDATA/recovery.signal
|
||||||
|
|
||||||
|
echo "[*] Starting PostgreSQL..."
|
||||||
|
sudo -n systemctl start postgresql@16-main
|
||||||
|
sleep 5
|
||||||
|
|
||||||
|
RECOVERY_STATE=\$(sudo -n -u postgres psql -U postgres -tAc "SELECT pg_is_in_recovery();")
|
||||||
|
|
||||||
|
if [[ "\$RECOVERY_STATE" == "f" ]]; then
|
||||||
|
echo "[✓] Recovery complete."
|
||||||
|
else
|
||||||
|
echo "[!] Still in recovery, promoting..."
|
||||||
|
sudo -n systemctl restart postgresql@16-main
|
||||||
|
sleep 5
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[*] WAL replay point:"
|
||||||
|
sudo -n -u postgres psql -U postgres -tAc "SELECT pg_last_wal_replay_lsn(), now();"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
echo "[✓] Full backup and PITR validation completed successfully." | tee -a "$LOG_FILE"
|
File diff suppressed because it is too large
Load Diff
@ -479,3 +479,208 @@ NameError: name 'check_remote_disk' is not defined. Did you mean: 'check_remote_
|
|||||||
✅ Genesis Radio Healthcheck 2025-04-29 04:45:14: All systems normal.
|
✅ Genesis Radio Healthcheck 2025-04-29 04:45:14: All systems normal.
|
||||||
✅ Genesis Radio Healthcheck 2025-04-29 05:00:12: All systems normal.
|
✅ Genesis Radio Healthcheck 2025-04-29 05:00:12: All systems normal.
|
||||||
✅ Genesis Radio Healthcheck 2025-04-29 05:15:10: All systems normal.
|
✅ Genesis Radio Healthcheck 2025-04-29 05:15:10: All systems normal.
|
||||||
|
⚠️ Genesis Radio Warning Healthcheck 2025-04-29 05:30:14 ⚠️
|
||||||
|
⚡ 1 warnings found:
|
||||||
|
- ⚠️ [mastodon] WARNING: Pattern 'ERROR' in /var/log/syslog
|
||||||
|
✅ Genesis Radio Healthcheck 2025-04-29 05:45:13: All systems normal.
|
||||||
|
✅ Genesis Radio Healthcheck 2025-04-29 06:00:12: All systems normal.
|
||||||
|
⚠️ Genesis Radio Warning Healthcheck 2025-04-29 06:15:13 ⚠️
|
||||||
|
⚡ 2 warnings found:
|
||||||
|
- ⚠️ [mastodon] WARNING: Pattern 'ERROR' in /var/log/syslog
|
||||||
|
- 💥 [db2] WARNING: Replication lag is 168 seconds.
|
||||||
|
✅ Genesis Radio Healthcheck 2025-04-29 06:30:11: All systems normal.
|
||||||
|
✅ Genesis Radio Healthcheck 2025-04-29 06:45:16: All systems normal.
|
||||||
|
⚠️ Genesis Radio Warning Healthcheck 2025-04-29 07:00:16 ⚠️
|
||||||
|
⚡ 1 warnings found:
|
||||||
|
- ⚠️ [mastodon] WARNING: Pattern 'ERROR' in /var/log/syslog
|
||||||
|
✅ Genesis Radio Healthcheck 2025-04-29 07:15:14: All systems normal.
|
||||||
|
⚠️ Genesis Radio Warning Healthcheck 2025-04-29 07:30:13 ⚠️
|
||||||
|
⚡ 2 warnings found:
|
||||||
|
- ⚠️ [mastodon] WARNING: Pattern 'ERROR' in /var/log/syslog
|
||||||
|
- 💥 [db2] WARNING: Replication lag is 65 seconds.
|
||||||
|
✅ Genesis Radio Healthcheck 2025-04-29 07:45:12: All systems normal.
|
||||||
|
✅ Genesis Radio Healthcheck 2025-04-29 08:00:13: All systems normal.
|
||||||
|
⚠️ Genesis Radio Warning Healthcheck 2025-04-29 08:15:14 ⚠️
|
||||||
|
⚡ 1 warnings found:
|
||||||
|
- ⚠️ [mastodon] WARNING: Pattern 'ERROR' in /var/log/syslog
|
||||||
|
✅ Genesis Radio Healthcheck 2025-04-29 08:30:14: All systems normal.
|
||||||
|
✅ Genesis Radio Healthcheck 2025-04-29 08:45:12: All systems normal.
|
||||||
|
Exception (client): Error reading SSH protocol banner
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2369, in _check_banner
|
||||||
|
buf = self.packetizer.readline(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 395, in readline
|
||||||
|
buf += self._read_timeout(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 665, in _read_timeout
|
||||||
|
raise EOFError()
|
||||||
|
EOFError
|
||||||
|
|
||||||
|
During handling of the above exception, another exception occurred:
|
||||||
|
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2185, in run
|
||||||
|
self._check_banner()
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2373, in _check_banner
|
||||||
|
raise SSHException(
|
||||||
|
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
|
||||||
|
|
||||||
|
Exception (client): Error reading SSH protocol banner
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2369, in _check_banner
|
||||||
|
buf = self.packetizer.readline(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 395, in readline
|
||||||
|
buf += self._read_timeout(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 665, in _read_timeout
|
||||||
|
raise EOFError()
|
||||||
|
EOFError
|
||||||
|
|
||||||
|
During handling of the above exception, another exception occurred:
|
||||||
|
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2185, in run
|
||||||
|
self._check_banner()
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2373, in _check_banner
|
||||||
|
raise SSHException(
|
||||||
|
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
|
||||||
|
|
||||||
|
Exception (client): Error reading SSH protocol banner
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2369, in _check_banner
|
||||||
|
buf = self.packetizer.readline(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 395, in readline
|
||||||
|
buf += self._read_timeout(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 665, in _read_timeout
|
||||||
|
raise EOFError()
|
||||||
|
EOFError
|
||||||
|
|
||||||
|
During handling of the above exception, another exception occurred:
|
||||||
|
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2185, in run
|
||||||
|
self._check_banner()
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2373, in _check_banner
|
||||||
|
raise SSHException(
|
||||||
|
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
|
||||||
|
|
||||||
|
Exception (client): Error reading SSH protocol banner
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2369, in _check_banner
|
||||||
|
buf = self.packetizer.readline(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 395, in readline
|
||||||
|
buf += self._read_timeout(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 665, in _read_timeout
|
||||||
|
raise EOFError()
|
||||||
|
EOFError
|
||||||
|
|
||||||
|
During handling of the above exception, another exception occurred:
|
||||||
|
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2185, in run
|
||||||
|
self._check_banner()
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2373, in _check_banner
|
||||||
|
raise SSHException(
|
||||||
|
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
|
||||||
|
|
||||||
|
Exception (client): Error reading SSH protocol banner
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2369, in _check_banner
|
||||||
|
buf = self.packetizer.readline(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 395, in readline
|
||||||
|
buf += self._read_timeout(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 665, in _read_timeout
|
||||||
|
raise EOFError()
|
||||||
|
EOFError
|
||||||
|
|
||||||
|
During handling of the above exception, another exception occurred:
|
||||||
|
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2185, in run
|
||||||
|
self._check_banner()
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2373, in _check_banner
|
||||||
|
raise SSHException(
|
||||||
|
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
|
||||||
|
|
||||||
|
Exception (client): Error reading SSH protocol banner
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2369, in _check_banner
|
||||||
|
buf = self.packetizer.readline(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 395, in readline
|
||||||
|
buf += self._read_timeout(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 665, in _read_timeout
|
||||||
|
raise EOFError()
|
||||||
|
EOFError
|
||||||
|
|
||||||
|
During handling of the above exception, another exception occurred:
|
||||||
|
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2185, in run
|
||||||
|
self._check_banner()
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2373, in _check_banner
|
||||||
|
raise SSHException(
|
||||||
|
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
|
||||||
|
|
||||||
|
Exception (client): Error reading SSH protocol banner
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2369, in _check_banner
|
||||||
|
buf = self.packetizer.readline(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 395, in readline
|
||||||
|
buf += self._read_timeout(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 665, in _read_timeout
|
||||||
|
raise EOFError()
|
||||||
|
EOFError
|
||||||
|
|
||||||
|
During handling of the above exception, another exception occurred:
|
||||||
|
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2185, in run
|
||||||
|
self._check_banner()
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2373, in _check_banner
|
||||||
|
raise SSHException(
|
||||||
|
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
|
||||||
|
|
||||||
|
Exception (client): Error reading SSH protocol banner
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2369, in _check_banner
|
||||||
|
buf = self.packetizer.readline(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 395, in readline
|
||||||
|
buf += self._read_timeout(timeout)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/packet.py", line 665, in _read_timeout
|
||||||
|
raise EOFError()
|
||||||
|
EOFError
|
||||||
|
|
||||||
|
During handling of the above exception, another exception occurred:
|
||||||
|
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2185, in run
|
||||||
|
self._check_banner()
|
||||||
|
File "/home/doc/dbcheck/lib/python3.12/site-packages/paramiko/transport.py", line 2373, in _check_banner
|
||||||
|
raise SSHException(
|
||||||
|
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
|
||||||
|
|
||||||
|
⚠️ Genesis Radio Warning Healthcheck 2025-04-29 09:00:10 ⚠️
|
||||||
|
⚡ 4 warnings found:
|
||||||
|
- ⚠️ [shredder] ERROR: Could not read log /var/log/nginx/error.log: Error reading SSH protocol banner
|
||||||
|
- ⚠️ [mastodon] WARNING: Pattern 'ERROR' in /var/log/syslog
|
||||||
|
- ⚠️ [db1] ERROR: Could not read log /var/log/syslog: Error reading SSH protocol banner
|
||||||
|
- ⚠️ [db2] ERROR: Could not read log /var/log/nginx/error.log: Error reading SSH protocol banner
|
||||||
|
✅ Genesis Radio Healthcheck 2025-04-29 09:15:07: All systems normal.
|
||||||
|
⚠️ Genesis Radio Warning Healthcheck 2025-04-29 09:30:12 ⚠️
|
||||||
|
⚡ 1 warnings found:
|
||||||
|
- 💥 [db2] WARNING: Replication lag is 73 seconds.
|
||||||
|
✅ Genesis Radio Healthcheck 2025-04-29 09:45:14: All systems normal.
|
||||||
|
1
miscellaneous/pitr_reference_time.txt
Normal file
1
miscellaneous/pitr_reference_time.txt
Normal file
@ -0,0 +1 @@
|
|||||||
|
2025-04-29 05:48:05
|
Loading…
x
Reference in New Issue
Block a user