Upgrading PostgreSQL from version 11 to 13 using the pg_upgrade tool
- Before the upgrade
-
Upgrade procedure
- Disaster recovery
Before the upgrade
This article explains, how to upgrade PostgreSQL from 11 to 13 using the pg_upgrade tool.
Before the upgrade, the BSS database will be checked for tables with OID columns. As these columns are not supported starting PostgreSQL version 12, such tables, if found, must be updated as explained below.
A pre-check detecting tables with OID columns is performed automatically before the upgrade, however, it is possible to run it manually. To do this, complete the following steps:
- Connect to the BSS database.
- Run the following query:
select c.relname
, c.reltuples
, pg_size_pretty(c.relpages::bigint * 8192) table_size
from pg_class c
join pg_namespace ns on (ns.oid = c.relnamespace)
where 1=1
and c.relkind = 'r'
and ns.nspname = 'public'
and exists (select 1 from pg_attribute a where a.attrelid = c.oid and a.attnum <= 0 and a.attname = 'oid')
order by c.relpages desc;
- An empty result indicates that the database does not contain tables with OID columns and the upgrade can be started.
-
If the result is not empty, and there are tables with OID columns in the BSS database, you must disable OID option for the tables before starting the upgrade. To do this, execute the following query:
ALTER TABLE <tablename> SET WITHOUT OIDS;
Note: The ALTER TABLE statement locks the table exclusively and re-writes the whole table. This can take significant time and consume much disk space for large tables.
Upgrade procedure
The upgrade procedure consists of two stages, as the replica is not upgraded along with the master database:
- Master database upgrade
- Replica upgrade
Master database upgrade
To upgrade PostgreSQL from version 11 to 13 for ClodBlue, complete the following steps.
-
Login to a workstation with access to the Kubernetes cluster, where CloudBlue is deployed, and stop all applications that use the PostgreSQL database, including OSS, BSS, and other microservices. Depending on the infrastructure, the commands to do that can be different. These are example commands:
kubectl scale statefulset --replicas=0 oss-node
kubectl scale deploy --all --replicas=0 -n <NS>
Note: The subsequent commands must be executed on BSS and OSS database hosts.
- Install PostgreSQL 13.
-
Stop the puppet service using the following command:
systemctl stop puppet
-
Download the postgresql-13 packages for the latest minor release (in our example, for the 13.7 version released on May 2022) and install them:
postgresql13-13.7-1PGDG.rhel7.x86_64.rpm
postgresql13-contrib-13.7-1PGDG.rhel7.x86_64.rpm
postgresql13-libs-13.7-1PGDG.rhel7.x86_64.rpm
postgresql13-server-13.7-1PGDG.rhel7.x86_64.rpm
-
Download the packages from the https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-7-x86_64/ repository to the current directory and install them using the following command as an example:
yum localinstall ./postgresql13-13.7-1PGDG.rhel7.x86_64.rpm ./postgresql13-contrib-13.7-1PGDG.rhel7.x86_64.rpm ./postgresql13-libs-13.7-1PGDG.rhel7.x86_64.rpm ./postgresql13-server-13.7-1PGDG.rhel7.x86_64.rpm
-
Initialize a new PostgreSQL cluster using this command as an example:
su - postgres -c '/usr/pgsql-13/bin/initdb --locale=en_US.UTF-8 --pgdata=/var/lib/pgsql/13/data'
-
Ensure that free disk space available in the /var/lib/pgsql/13/data directory is at least 1.2 (20 percent) greater than the disk space used in the /var/lib/pgsql/11/data directory. This is necessary for a successful upgrade. Below are example commands that can be used to find out the disk space used and available:
-
Sample command:
du -hs /var/lib/pgsql/11/data
Sample output:
799M /var/lib/pgsql/11/data
-
Sample command:
df -h /var/lib/pgsql/13/data
Sample output:
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 30G 3,6G 26G 13% /
-
Back up the data.
-
Create a directory to store the dump. Below are example commands:
mkdir /opt/dumps/host
chown -R postgres:postgres /opt/dumps/host
-
Check the directory. Below is an example command
df -h /opt/dumps/host
-
Export the data. Below is an example command
export cpu_cnt=`grep -c "processor" /proc/cpuinfo` &&
jobs=$(( $cpu_cnt * 3 / 2 )); echo "jobs=$jobs" &&
echo "Start at "`date` &&
dump_path="/opt/dumps/host" &&
pg_13_bin="/usr/pgsql-13/bin" &&
cd ${dump_path} &&
for DB in $(psql -U postgres -d postgres -t -c "select datname from pg_database where not datname ~ 'template|oids|test|pass'");
do echo "${dump_path}/${DB} `date`";
su postgres -c "mkdir ${dump_path}/${DB}";
su postgres -c "${pg_13_bin}/pg_dump -d ${DB} -N pg-exporter-bss -N pg-exporter-oss -Z 1 -j ${jobs} -F d -v -C -f ${dump_path}/${DB}";
done &&
su postgres -c "${pg_13_bin}/pg_dumpall -r -U postgres -f ${dump_path}/users_n_roles.sql" &&
echo "Finish at"`date`
- Upgrade the data using the pg_upgrade tool.
-
Check a cast for OID columns exists. Below is an example query that you can use:
psql -U postgres -d postgres -t -c "select p.proname from pg_cast c join pg_proc p on (p.oid = c.castfunc) where p.proname = 'varchar2uuid'"
If the result is varchar2uuid, the cast exists and the data can be created in the PostgreSQL 13 database.
-
Stop PostgreSQL 11 service. You can use the following command:
systemctl stop postgresql-11
-
Check that there are no postgres processes running on the database host:
ps -ef|grep "postgres"|grep -v "grep"
Warning! If any process was found, you must identify, disconnect, and stop found application(s). After that, repeat this step to stop PostgreSQL 11 service.
- Temporary enable local access to all databases in cluster only for postgres and back up PostgreSQL configuration: pg_hba.conf. Use the following commands:
cp /var/lib/pgsql/11/data/pg_hba.conf /root/pg_hba.conf_bak_`date '+%Y-%m-%d'`
su - postgres -c 'cp /var/lib/pgsql/11/data/pg_hba.conf /var/lib/pgsql/11/data/pg_hba.conf.prod'
su - postgres -c 'echo "local all postgres trust" >/var/lib/pgsql/11/data/pg_hba.conf'
-
Copy configuration files from PostgreSQL 11 to PostgreSQL 13 server:
su - postgres -c 'cp /var/lib/pgsql/11/data/pg_hba.conf* /var/lib/pgsql/13/data'
su - postgres -c 'cp /var/lib/pgsql/11/data/pg_ident.conf /var/lib/pgsql/13/data'
su - postgres -c 'cp /var/lib/pgsql/11/data/postgresql.conf /var/lib/pgsql/13/data'
su - postgres -c 'cp /var/lib/pgsql/11/data/server.* /var/lib/pgsql/13/data'
-
If required, enable PostgreSQL 13 features:
If the database host has more than 8 CPUs, then the max_worker_processes must be set to the number of CPUs minus 1. The default value is 8.
Only if the database host has an SSD disk, set effective_io_concurrency to 100. This is applicable only to SSD disks.
- The following parameters are added in PostgreSQL 13:
autovacuum_vacuum_insert_scale_factor=0.2
autovacuum_vacuum_insert_threshold=1000
hash_mem_multiplier=1
-
Run pg_upgrade in check mode.
su - postgres -c '/usr/pgsql-13/bin/pg_upgrade --jobs=8 --new-port=8352 --old-port=8352 --check --old-datadir "/var/lib/pgsql/11/data" --new-datadir "/var/lib/pgsql/13/data" --old-bindir "/usr/pgsql-11/bin" --new-bindir "/usr/pgsql-13/bin"'
All results must be positive like in the example below:
#su - postgres -c '/usr/pgsql-13/bin/pg_upgrade --jobs=8 --new-port=8352 --old-port=8352 --check --old-datadir "/var/lib/pgsql/11/data" --new-datadir "/var/lib/pgsql/13/data" --old-bindir "/usr/pgsql-11/bin" --new-bindir "/usr/pgsql-13/bin"'
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for system-defined composite types in user tables ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for tables WITH OIDS ok
Checking for invalid "sql_identifier" user columns ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
Checking for new cluster tablespace directories ok
*Clusters are compatible*
-
If there are OID columns in the billing database, the response will be similar to the one below and you will need to update the tables to remove OIDs.
# su - postgres -c '/usr/pgsql-13/bin/pg_upgrade --jobs=8 --new-port=8352 --old-port=8352 --check --old-datadir "/var/lib/pgsql/11/data" --new-datadir "/var/lib/pgsql/13/data" --old-bindir "/usr/pgsql-11/bin" --new-bindir "/usr/pgsql-13/bin"'
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for system-defined composite types in user tables ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for tables WITH OIDS fatal
Your installation contains tables declared WITH OIDS, which is not
supported anymore. Consider removing the oid column using
ALTER TABLE ... SET WITHOUT OIDS;
A list of tables with the problem is in the file:
tables_with_oids.txt
Failure, exiting
- Skip this step if no columns with OIDs were found in the billing database. Otherwise, complete the steps below to make the necessary corrections.
-
Prepare the following script to be executed in the PostgreSQL 11 database. Complete the following commands:s
sed 's/In database:/\\c/g' /var/lib/pgsql/tables_with_oids.txt | awk -F '.' '{if ($0 ~ '/\c\ /') {print $0} else {print "alter table "$1".\""$2"\" set without oids;"}}' > /var/lib/pgsql/alter_tables_without_oids.sql
chown postgres /var/lib/pgsql/alter_tables_without_oids.sql
-
Run the following commands to apply the fix:
systemctl start postgresql-11 su - postgres -c 'psql -U postgres -d pba -f /var/lib/pgsql/alter_tables_without_oids.sql' systemctl stop postgresql-11
- Check if there are still columns with OIDs by completing step 5.g again.
-
Execute the upgrade using the following command:
su - postgres -c '/usr/pgsql-13/bin/pg_upgrade --jobs=8 --new-port=8352 --old-port=8352 --old-datadir "/var/lib/pgsql/11/data" --new-datadir "/var/lib/pgsql/13/data" --old-bindir "/usr/pgsql-11/bin" --new-bindir "/usr/pgsql-13/bin"'
-
Restore the PostgreSQL configuration by executing the following commands:
su - postgres -c 'mv /var/lib/pgsql/11/data/pg_hba.conf.prod /var/lib/pgsql/11/data/pg_hba.conf'
su - postgres -c 'mv /var/lib/pgsql/13/data/pg_hba.conf.prod /var/lib/pgsql/13/data/pg_hba.conf'
- More details on the upgrade from PostgreSQL 11 to PostgreSQL 13 can be found in the PostgreSQL documentation.
-
Start PostgreSQL 13 service.
systemctl enable postgresql-13
systemctl start postgresql-13
-
Disable PostgreSQL 11:
systemctl disable postgresql-11
-
If the pg_stat_statements extension is used, upgrade it using the script below as an example:
do $$
begin
--
if exists (
select 1
from pg_extension
where extname = 'pg_stat_statements'
and extversion = '1.6'
)
then
execute 'drop extension pg_stat_statements cascade';
execute 'create extension pg_stat_statements';
end if;
--
end $$;
\dx
-
Reindex and analyze the new database cluster.
su - postgres -c '/usr/pgsql-13/bin/reindexdb --all --jobs=8'
su - postgres -c '/usr/pgsql-13/bin/vacuumdb --all --jobs=8 --analyze-in-stages'
-
If you use CloudBlue core components of version 21.16 or later, skip this step.
If a cast was used in PostgreSQL 11, create it in PostgreSQL 13:
/usr/pgsql-13/bin/psql -U postgres -d postgres -t -c "select * from setup_varchar2uuid(false)"
-
Start CloudBlue applications that use PostgreSQL. Depending on the installation, the component list can be different. Use this command as an example:
kubectl scale statefulset --replicas=1 oss-node
kubectl scale --replicas=1 deployment/idp
Check the applications' health. In case of an issue, refer to the Disaster Recovery section.
-
Repair DB Replication after upgrade to PostgreSQL 13:
-
Identify master DB replica hosts. This is a sample command to do this:
grep -E "^host\s*replication\s*all\s*([0-9\.]*)" /var/lib/pgsql/11/data/pg_hba.conf |sed -E "s/.*all\s*([0-9\.]*).*/\1/"
10.26.230.128
Before restoring the replica, set the proper value for the wal_keep_size configuration parameter and it defines the size of the WAL logs to be kept on the master database server. In PostgreSQL 11 this parameter had the name wal_keep_segments.
To calculate the value for the wal_keep_size parameter, multiply the wal_keep_segments value by the wal_segment_size value. Specify the result value as the value for the wal_keep_size parameter in the /var/lib/pgsql/13/data/postgresql.conf file.
-
To apply the changes, reload the postgres-13 service
systemctl reload postgresql-13
-
Login to the replication host and install PostgreSQL 13 there:
yum localinstall ./postgresql13-13.6-1PGDG.rhel7.x86_64.rpm ./postgresql13-contrib-13.6-1PGDG.rhel7.x86_64.rpm ./postgresql13-libs-13.6-1PGDG.rhel7.x86_64.rpm ./postgresql13-server-13.6-1PGDG.rhel7.x86_64.rpm
-
Stop and disable PostgreSQL 11 service:
systemctl stop postgresql-11
systemctl disable postgresql-11
-
Make sure that there are no processes of PostgreSQL 11 running. To do this, execute this command on each server:
ps -ef | grep "postgres"| grep -v "grep"
-
Restore replication configuration for PostgreSQL 13
su - postgres -c '/usr/pgsql-13/bin/initdb --locale=en_US.UTF-8 --pgdata=/var/lib/pgsql/13/data'
su - postgres -c 'cp /var/lib/pgsql/11/data/pg_hba.conf /var/lib/pgsql/13/data'
su - postgres -c 'cp /var/lib/pgsql/11/data/pg_ident.conf /var/lib/pgsql/13/data'
su - postgres -c 'cp /var/lib/pgsql/11/data/postgresql.conf /var/lib/pgsql/13/data'
-
Make sure that there are no PostgreSQL 11 references in this file: /var/lib/pgsql/13/data/postgresql.conf. An example of such a reference is this one:
data_directory = '/var/lib/postgresql/11/data'.
-
Repair replication for PostgreSQL 13. To do this, create the /usr/local/bin/db-replica-repair-13.sh script with the content below, specifying the master database's IP address in the maindb parameter:
maindb= ###### fill correct value for master database ######
echo "Stopping DB.."
systemctl stop postgresql-11
echo "Backing up.."
cp -f /var/lib/pgsql/13/data/postgresql.conf /root/postgresql.conf.bak
cp -f /var/lib/pgsql/13/data/pg_hba.conf /root/pg_hba.conf.bak
echo "Rebuilding DB.."
rm -rf /var/lib/pgsql/13/data
sudo -u postgres pg_basebackup -D /var/lib/pgsql/13/data -U repl -h $maindb -P -v -R -X fetch
if [ "$?" != "0" ]; then
echo "Failed."
exit 1
fi
echo "Restoring configs.."
mv -f /root/postgresql.conf.bak /var/lib/pgsql/13/data/postgresql.conf
echo -e "\nhot_standby=on\n" >> /var/lib/pgsql/13/data/postgresql.conf
cp -f /root/pg_hba.conf.bak /var/lib/pgsql/13/data/pg_hba.conf
chown postgres:postgres -R /var/lib/pgsql/13/data/
echo "Starting DB.."
systemctl start postgresql-13
if [ "$?" != "0" ]; then
echo "Failed."
exit 1
fi
sleep 5
echo "Verifying.."
t1=$(sudo -u postgres psql -Aqtc "SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::INT"2> /dev/null)
sleep 30
t2=$(sudo -u postgres psql -Aqtc "SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::INT"2> /dev/null)
if (( $t2-$t1 >= 30 )); then
echo "Replica rebuild failed. Please try again"
exit 1
else
echo "All done!"
fi
-
Run the script to fix replication:
sh /usr/local/bin/db-replica-repair-13.sh
-
Enable and start PostgreSQL 13 service
systemctl enable postgresql-13
systemctl start postgresql-13
-
After repairing the replica, change the wal_keep_size parameter value to 8GB in the /var/lib/pgsql/13/data/postgresql.conf file of the master database and reload PostgreSQL configuration:
systemctl reload postgresql-13
Known Issues
-
Execution of the /usr/local/bin/db-replica-repair-13.sh script can stuck with the following message:
pg_basebackup: initiating base backup, waiting for checkpoint to complete
If this happens, create CHECKPOINT in the master database and execute the following on the main DB host:
sudo -u postgres psql -c "CHECKPOINT;"
Disaster Recovery
In case of any issue during the upgrade process, you need to:
- Stop all applications connected to the PostgreSQL database as described in step 1.
-
Stop and disable the postgresql-13 service:
systemctl stop postgresql-13
systemctl disable postgresql-13
-
Restore the production pg_hba.conf:
su - postgres -c 'mv /var/lib/pgsql/11/data/pg_hba.conf.prod /var/lib/pgsql/11/data/pg_hba.conf'
-
Enable and start the postgresql-11 service:
systemctl enable postgresql-11
systemctl start postgresql-11
-
For the BSS database, restore the "WITH OIDS" option in PostgreSQL 11 cluster.
su - postgres -c 'psql -U postgres -d pba -f /var/lib/pgsql/alter_tables_without_oids.sql'
Start all application services, as described in step 15.
-
Restore the production pg_hba.conf:
su - postgres -c 'mv /var/lib/pgsql/11/data/pg_hba.conf.prod /var/lib/pgsql/11/data/pg_hba.conf'
Make sure to keep PostgreSQL 13 data for further investigation.
This completes the recovery procedure.