Skip to content

Latest commit

 

History

History
201 lines (162 loc) · 7.49 KB

GETTING-STARTED.md

File metadata and controls

201 lines (162 loc) · 7.49 KB

pgEdge Platform - Getting Started Guide

In this guide we will step through setting up pgEdge Platform. We’ll focus for now on setting this up on Rocky Linux 8, however very similar steps are used on Debian based systems or on MacOS. Our goal in this guide is to create a multi-active pgEdge cluster and then use pgbench to create some representative tables and read/write activity on the cluster.

Two software components from pgEdge will be used in this guide. First, we’ll install nodectl which is used to configure Postgres and install additional extensions. Second is Spock, the Postgres extension providing logical, multi-active replication.

You’ll need root permissions on these systems in order to install any OS packages needed as prerequisites.

Prerequisites

  • Enterprise Linux (EL) 8+, Ubuntu 20.04+, Debian 10+, OSX M1/M2
  • A non-root user with sudo privileges
  • systemctl available
  • Python 3.6+& an updated pip3
  • Python packages install via updated pip3 as described below
  • Two servers (vm's are fine) networked with traffic on port 5432 allowed
  • SSH access into the servers

On EL8+ systems, we’ll make sure these packages are installed:

 yum install -y python39 python39-pip curl sudo

On Debian based systems, we'll make sure the native packages are installed:

apt install -y python3 python3-devel python3-psutil

On both EL8+ & Debian based systems, we'll make sure the latest pip3 is installed and then install psycopg3 as follows:

pip3 install psycopg psycopg-binary --user

Installation

In any directory owned by your user, use the following command to install nodectl:

python3 -c "$(curl -fsSL https://pgedge-download.s3.amazonaws.com/REPO/install.py)"

cd into the pgedge directory created and install the pgEdge Platform with the nc command. Specify a superuser name, password, and a database name. Note that the names cannot be pgEdge and cannot be any postgreSQL reserved words. For the examples given in this documentation, I will be using a database named demo.

./nc spock install -U <superuser-name> -P <superuser-password> -d <database-name>

For this demo I will be using the following command:

./nc spock install -U admin -P mypassword1 -d demo

If you encounter an error running this command, you may need to update your SELINUX mode to permissive, reboot, and retry the operation.

Configuration

Using nodectl on each node, create the spock components needed for replication. First you will create a spock node by providing the name of the node, network address, and database name. You will provide the IP address of each node and the name of the pgedge user which has been created for replication, not the super user you created. Next you will make replication sets by providing the replication set name and the database name. For both the node name (n1) and the replication set name (demo_replication_set), these can be whatever you want but you will have to reference them in future commands.

Node n1 (IP address 10.1.2.5):

./nc spock create-node n1 "host=10.1.2.5 user=pgedge dbname=demo" demo
./nc spock create-repset demo_replication_set demo

Node n2 (IP address 10.2.2.5):

./nc spock create-node n2 'host=10.2.2.5 user=pgedge dbname=demo' demo
./nc spock create-repset demo_replication_set demo

Next, use nodectl to create the subscriptions. For these commands you will provide the subscription name, the network address for the node this one is subscribing to, and the database name.

Node n1:

./nc spock create-sub sub_n1n2 'host=10.2.2.5 port=5432 user=pgedge dbname=demo' demo

Node n2:

./nc spock create-sub sub_n2n1 'host=10.1.2.5 port=5432 user=pgedge dbname=demo' demo

At this point, you will have a two node cluster with cross subscriptions on n1 to n2 and n2 to n1. For replication to begin, you will need to add tables to the replication sets and then add those replications to the subscriptions. For this demo, I will be using pgBench to set up a very simple four table database.

You can source the postgres environment variables and connect to your database with:

source pg15/pg15.env

This also adds pgbench and psql to your PATH. When using either command, you will still need to specify your database name, for example:

psql demo

Initialize with the pgBench command, psql into the postgreSQL database:

pgbench -i demo

Once connected to the database, alter the numeric columns to have LOG_OLD_VALUE equal to true. This will make these numeric fields Conflict-Free Delta-Apply columns.

ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET (LOG_OLD_VALUE=true);
ALTER TABLE pgbench_branches ALTER COLUMN bbalance SET (LOG_OLD_VALUE=true);
ALTER TABLE pgbench_tellers ALTER COLUMN tbalance SET (LOG_OLD_VALUE=true);

Dump and restore the database on to node 2 so that you are left with two databases with the same schema and data.

n1:

pg_dump -v -d demo > dump-1.sql

Copy file to n2, and run on n2:

psql demo -f dump-1.sql

Run the following on both nodes to add these tables to the replication set. The fourth table, pgbench_history, cannot be added because it does not have a primary key.

./nc spock repset-add-table demo_replication_set pgbench_branches demo
./nc spock repset-add-table demo_replication_set pgbench_tellers demo
./nc spock repset-add-table demo_replication_set pgbench_accounts demo

Finish the set up by adding the replication sets to the subscriptions you had created.
n1:

./nc spock sub-add-repset sub_n1n2 demo_replication_set demo

n2:

./nc spock sub-add-repset sub_n2n1 demo_replication_set demo

Check the configuration with the following sql statements

demo=# SELECT * FROM spock.node;
node_id | node_name
---------+----------
673694252 | n1
560818415 | n2
(2 rows)
demo=# SELECT sub_id, sub_name, sub_slot_name, sub_replication_sets  FROM spock.subscription;
   sub_id   | sub_name |	sub_slot_name 	|                	sub_replication_sets             
------------+----------+----------------------+--------------------------------------------------------
 3293941396 | sub_n1n2 | spk_demo_n2_sub_n1n2 | {default,default_insert_only,ddl_sql,demo_replication_set}
(1 row)

Test Replication

Run an update on n1 to see the update on n2.

n1:

demo=# SELECT * FROM pgbench_tellers WHERE tid = 1;
 tid | bid | tbalance | filler
-----+-----+----------+--------
   1 |   1 |    	0 |
 (1 row)
demo=# UPDATE pgbench_tellers SET filler = 'test' WHERE tid = 1;
UPDATE 1

Check n2:

demo=# SELECT * FROM pgbench_tellers WHERE tid = 1;
 tid | bid | tbalance | filler  	 
-----+-----+----------+--------------------------------------------------
   1 |   1 |    	0 | test                               
(1 row)

Run the following command on both nodes at the same time to run pgBench for one minute.

pgbench -R 100 -T 60 -n demo

Check the results on both nodes and see that the sum of the tbalance columns match on both pgbench_tellers tables. Without the Conflict-Free Delta-Apply columns, each conflict would have resulted in accepting the first in, potentially leading to sums that do not match between nodes.

n1:

demo=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -84803
(1 row)

n2:

demo=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -84803
(1 row)