Database

Manual Replication Monitoring

Track replication health and performance.


Monitoring replication lag is important and there are 3 ways to do this:

  1. Dashboard - Under the Reports of the dashboard, you can view the replication lag of your project
  2. Database -
    • pg_stat_subscription (subscriber) - if PID is null, then the subscription is not active
    • pg_stat_subscription_stats - look here for error_count to see if there were issues applying or syncing (if yes, check the logs for why)
    • pg_replication_slots - use this to check if the slot is active and you can also calculate the lag from here
  3. Metrics - Using the prometheus endpoint for your project
    • replication_slots_max_lag_bytes - this is the more important one
    • pg_stat_replication_replay_lag - lag to replay WAL files from the source DB on the target DB (throttled by disk or high activity)
    • pg_stat_replication_send_lag - lag in sending WAL files from the source DB (a high lag means that the publisher is not being asked to send new WAL files OR network issues)

Primary

Replication status and lag

The pg_stat_replication table shows the status of any replicas connected to the primary database.

1
2
select pid, application_name, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, sync_statefrom pg_stat_replication;

Replication slot status

A replication slot can be in one of three states:

  • active - The slot is active and is receiving data
  • inactive - The slot is not active and is not receiving data
  • lost - The slot is lost and is not receiving data

The state can be checked using the pg_replication_slots table:

1
select slot_name, active, state from pg_replication_slots;

WAL size

The WAL size can be checked using the pg_ls_waldir() function:

1
select * from pg_ls_waldir();

Check the LSN

1
select pg_current_wal_lsn();

Subscriber

Subscription status

The pg_subscription table shows the status of any subscriptions on a replica and the pg_subscription_rel table shows the status of each table within a subscription.

The srsubstate column in pg_subscription_rel can be one of the following:

  • i - Initializing - The subscription is being initialized
  • d - Data Synchronizing - The subscription is synchronizing data for the first time (i.e. doing the initial copy)
  • s - Synchronized - The subscription is synchronized
  • r - Replicating - The subscription is replicating data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT sub.subname AS subscription_name, relid::regclass AS table_name, srel.srsubstate AS replication_state, CASE srel.srsubstate WHEN 'i' THEN 'Initializing' WHEN 'd' THEN 'Data Synchronizing' WHEN 's' THEN 'Synchronized' WHEN 'r' THEN 'Replicating' ELSE 'Unknown' END AS state_description, srel.srsyncedlsn AS last_synced_lsnFROM pg_subscription subJOIN pg_subscription_rel srel ON sub.oid = srel.srsubidORDER BY table_name;

Check the LSN

1
select pg_last_wal_replay_lsn();