Skip to content

Commit

Permalink
2188/data mapping (#88)
Browse files Browse the repository at this point in the history
Co-authored-by: Swagger V2 bot <[email protected]>
Co-authored-by: Jason Paige <[email protected]>
Co-authored-by: lucas-phillips28 <[email protected]>
  • Loading branch information
4 people authored Jan 25, 2024
1 parent 6f79cc0 commit af92b6b
Show file tree
Hide file tree
Showing 33 changed files with 1,906 additions and 11 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,4 @@ bin/main/application.yaml

applicationinsights-agent-*.jar
*.log
bin/migration/failed_imports_log.txt
108 changes: 108 additions & 0 deletions bin/migration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Database Migration Script

This script manages the migration of data from a source database to a destination database.


## How to Run the Script

1. **Set Environment Variables:** :
```
export SOURCE_DB_PASSWORD=<source_db_password>
export DESTINATION_DB_PASSWORD=<destination_db_password>
```
2. **Install Dependencies:** Install the required Python packages if not installed already:
```
pip install -r requirements.txt
```
3. **Execute the Script:** Run the migration script:
```
python main_script.py
```
4. **Test the migrated counts:** Run the summary script:
```
python summary.py
```
## Summary
The `summary.py` file provides an overview of database record counts for the source and destination dbs and count of failed imports
## DatabaseManager Class
### Methods:
- **`__init__(self, database, user, password, host, port)`:** Initializes the DatabaseManager class and establishes connections to databases.
- **`execute_query(self, query, params=None)`:** Executes queries and fetches results.
- **`close_connection(self)`:** Closes the database connections.
## Helper Functions
### `parse_to_timestamp(input_text)`
Parses date strings into UK timestamps, handling various date formats and returning the current time in the UK timezone if the input is invalid or empty.
### `check_existing_record(db_connection, table_name, field, record)`
Checks if a record exists in the database.
### `audit_entry_creation(db_connection, table_name, record_id, record, created_at=None, created_by="Data Entry")`
Creates an audit entry in the database for a new record.
### `log_failed_imports(failed_imports, filename='failed_imports_log.txt')`
Writes to failed_imports_log if record import fails
### `clear_migrations_file(filename='failed_imports_log.txt')`
Clears the failed imports log before the migration to avoid duplicate entries
## Main Logic
1. Initializes database connections.
2. Executes migration logic for each table manager.
3. Closes database connections.
## Table Managers
### RoomManager
Handles the migration of room data.
### UserManager
Manages the migration of user data.
### RoleManager
Manages user roles migration.
### CourtManager
Handles the migration of court-related data. An added 'Default Court' added for records with no data of which courts they're tried in.
### CourtRoomManager
Manages the migration of courtroom data.
### RegionManager
Manages the migration of region-related data.
### CourtRegionManager
Handles associations between courts and regions.
### PortalAccessManager
Manages user access to portals. The assumption is that Level 3 users have access to the Portal
### AppAccessManager
Handles user access to applications. The assumption is that all Roles except for Level 3 users have this access.
### CaseManager
Manages the migration of case-related data.
### BookingManager
Handles the migration of booking-related data.
### ParticipantManager
Manages the migration of participant-related data.
### BookingParticipantManager
Handles associations between bookings and participants.
### CaptureSessionManager
Manages the migration of capture session data.
### RecordingManager
Handles the migration of recording data.
24 changes: 24 additions & 0 deletions bin/migration/db_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import psycopg2

class DatabaseManager:
def __init__(self, database, user, password, host, port):
self.connection = psycopg2.connect(
database=database,
user=user,
password=password,
host=host,
port=port
)
self.cursor = self.connection

def execute_query(self, query, params=None):
self.cursor.execute(query, params)
return self.cursor.fetchall()

def close_connection(self):
self.cursor.close()
self.connection.close()




133 changes: 133 additions & 0 deletions bin/migration/main_script.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
import os
import time

from db_utils import DatabaseManager

from tables.rooms import RoomManager
from tables.users import UserManager
from tables.roles import RoleManager
from tables.courts import CourtManager
from tables.courtrooms import CourtRoomManager
from tables.regions import RegionManager
from tables.courtregions import CourtRegionManager
from tables.portalaccess import PortalAccessManager
from tables.appaccess import AppAccessManager
from tables.cases import CaseManager
from tables.bookings import BookingManager
from tables.participants import ParticipantManager
from tables.bookingparticipants import BookingParticipantManager
from tables.capturesessions import CaptureSessionManager
from tables.recordings import RecordingManager
from tables.sharebookings import ShareBookingsManager
from tables.audits import AuditLogManager

from tables.helpers import clear_migrations_file


# get passwords from env variables
source_db_password = os.environ.get('SOURCE_DB_PASSWORD')
destination_db_password = os.environ.get('DESTINATION_DB_PASSWORD')
test_db_password = os.environ.get('TEST_DB_PASSWORD')
staging_db_password = os.environ.get('STAGING_DB_PASSWORD')


# database connections
# staging db
# source_db = DatabaseManager(
# database="pre-pdb-stg",
# user="psqladmin",
# password=staging_db_password,
# host="pre-db-stg.postgres.database.azure.com",
# port="5432",
# )


# test db
# source_db = DatabaseManager(
# database="pre-pdb-test",
# user="psqladmin",
# password=test_db_password,
# host="pre-db-test.postgres.database.azure.com",
# port="5432",
# )

# demo database
source_db = DatabaseManager(
database="pre-pdb-demo",
user="psqladmin",
password=source_db_password,
host="pre-db-demo.postgres.database.azure.com",
port="5432",
)


# dummy database on dev server
destination_db = DatabaseManager(
database="dev-pre-copy",
user="psqladmin",
password=destination_db_password,
host="pre-db-dev.postgres.database.azure.com",
port="5432",
)

# managers for different tables
room_manager = RoomManager(source_db.connection.cursor())
user_manager = UserManager(source_db.connection.cursor())
role_manager = RoleManager(source_db.connection.cursor())
court_manager = CourtManager(source_db.connection.cursor())
courtroom_manager = CourtRoomManager()
region_manager = RegionManager()
court_region_manager = CourtRegionManager()
portal_access_manager = PortalAccessManager(source_db.connection.cursor())
app_access_manager = AppAccessManager(source_db.connection.cursor())
case_manager = CaseManager(source_db.connection.cursor())
booking_manager = BookingManager(source_db.connection.cursor())
participant_manager = ParticipantManager(source_db.connection.cursor())
booking_participant_manager = BookingParticipantManager(source_db.connection.cursor())
capture_session_manager = CaptureSessionManager(source_db.connection.cursor())
recording_manager = RecordingManager(source_db.connection.cursor())
share_bookings_manager = ShareBookingsManager(source_db.connection.cursor())
audit_log_manager = AuditLogManager(source_db.connection.cursor())

def migrate_manager_data(manager, destination_cursor):
start_time = time.time()
print(f"Migrating data for {manager.__class__.__name__}...")

if hasattr(manager, 'get_data') and callable(getattr(manager, 'get_data')):
source_data = manager.get_data()
manager.migrate_data(destination_cursor, source_data)
else:
manager.migrate_data(destination_cursor)

end_time = time.time()
time_taken = end_time - start_time
print(f"Data migration for {manager.__class__.__name__} complete in : {time_taken:.2f} seconds.\n")

def main():
clear_migrations_file()

destination_db_cursor = destination_db.connection.cursor()

migrate_manager_data(room_manager, destination_db_cursor)
migrate_manager_data(user_manager, destination_db_cursor)
migrate_manager_data(role_manager, destination_db_cursor)
migrate_manager_data(court_manager, destination_db_cursor)
migrate_manager_data(courtroom_manager, destination_db_cursor)
migrate_manager_data(region_manager, destination_db_cursor)
migrate_manager_data(court_region_manager, destination_db_cursor)
migrate_manager_data(portal_access_manager, destination_db_cursor)
migrate_manager_data(app_access_manager, destination_db_cursor)
migrate_manager_data(case_manager, destination_db_cursor)
migrate_manager_data(booking_manager, destination_db_cursor)
migrate_manager_data(participant_manager, destination_db_cursor)
migrate_manager_data(capture_session_manager, destination_db_cursor)
migrate_manager_data(recording_manager, destination_db_cursor)
migrate_manager_data(booking_participant_manager, destination_db_cursor)
migrate_manager_data(share_bookings_manager, destination_db_cursor)
migrate_manager_data(audit_log_manager, destination_db_cursor)

source_db.close_connection()
destination_db.close_connection()

if __name__ == "__main__":
main()
13 changes: 13 additions & 0 deletions bin/migration/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
attrs==22.2.0
importlib-metadata==4.8.3
iniconfig==1.1.1
packaging==21.3
pluggy==1.0.0
psycopg2==2.9.8
py==1.11.0
pyparsing==3.1.1
pytest==7.0.1
pytz==2023.3.post1
tomli==1.2.3
typing_extensions==4.1.1
zipp==3.6.0
104 changes: 104 additions & 0 deletions bin/migration/summary.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
import psycopg2
import os

# Connection
source_db_password = os.environ.get('SOURCE_DB_PASSWORD')
destination_db_password = os.environ.get('DESTINATION_DB_PASSWORD')

destination_conn = psycopg2.connect(
database="dev-pre-copy",
user="psqladmin",
password=destination_db_password,
host="pre-db-dev.postgres.database.azure.com",
port="5432",
)

source_conn = psycopg2.connect(
database="pre-pdb-demo",
user="psqladmin",
password=source_db_password,
host="pre-db-demo.postgres.database.azure.com",
port="5432",
)

# table mapping from old db table names to new
table_mapping = {
'recordings': 'recordings',
'share_recordings' : 'share_recordings',
'portal_access' : 'portal_access',
'audits': 'audits',
'courts': 'courts',
'court_region':'court_region',
'regions':'regions',
'courtrooms':'courtrooms',
'rooms':'rooms',
'contacts': 'participants',
'bookings':'bookings',
'cases': 'cases',
'booking_participant':'booking_participant',
'roles':'roles',
'role_permission':'role_permission',
'permissions': 'permissions',
'users': 'users',
'app_access':'app_access',
'capture_sessions':'capture_sessions'
}

# Counts the number of records in all tables in a provided db connection
def count_records_in_all_tables(connection):
cursor = connection.cursor()

cursor.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_type = 'BASE TABLE'")
tables = cursor.fetchall()

table_counts = {}
for table in tables:
table_name = table[0]
cursor.execute(f"SELECT COUNT(*) FROM public.{table_name}")
count = cursor.fetchone()[0]
table_counts[table_name] = count

cursor.close()
return table_counts

# Parses the failed imports log file to count the number of failed imports for each tables.
def count_failed_imports(file_path):
table_counts = {}

with open(file_path, 'r') as file:
for line in file:
split_line = line.split(', ')
if len(split_line) >= 2:
table_name = split_line[0].split(': ')[1].strip()

if table_name in table_counts:
table_counts[table_name] += 1
else:
table_counts[table_name] = 1
return table_counts

source_table_counts = count_records_in_all_tables(source_conn)
destination_table_counts = count_records_in_all_tables(destination_conn)

file_path = 'failed_imports_log.txt'
failed_imports = count_failed_imports(file_path)

# Displays the record counts in both source and destination db and the failed logs. This is to monitor for data loss.
def print_summary(source_counts, destination_counts, failed_imports):
print(f"| {'Table Name'.ljust(20)} | {'Source DB Records'.ljust(18)} | {'Destination DB Records'.ljust(26)} | {'Failed Imports Logs'.ljust(19)} ")
print(f"| {'------------'.ljust(20)} | {'------------------'.ljust(18)} | {'------------------'.ljust(26)} | {'---------------'.ljust(19)} ")

for source_table, destination_table in table_mapping.items():
source_records = source_counts.get(source_table, '-')
destination_records = destination_counts.get(destination_table, '-')
failed_import_count = failed_imports.get(source_table, '-')

print(f"| {destination_table.ljust(20)} | {str(source_records).ljust(18)} | {str(destination_records).ljust(26)} | {str(failed_import_count).ljust(19)}")

print_summary(source_table_counts, destination_table_counts, failed_imports)

source_conn.close()
destination_conn.close()



Loading

0 comments on commit af92b6b

Please sign in to comment.