Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2188/data mapping #88

Merged
merged 60 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
0c57260
Migration of data from demo db to dummy db on dev server
mazzopardi2 Dec 1, 2023
a6d55b3
added readme, added in log for failed imports function, refactored
mazzopardi2 Dec 8, 2023
68a2e5f
refactored code, added in exception blocks
mazzopardi2 Dec 11, 2023
984e2b6
improved participants class to enable logging of failed records, adde…
mazzopardi2 Dec 12, 2023
33d52a0
pushing in failed logs as missed off the last commit
mazzopardi2 Dec 12, 2023
200d015
Added helper function to clear failed migration logs and resolved han…
mazzopardi2 Dec 13, 2023
71a6ea1
resolved issues for non migrated records for participants, court regi…
mazzopardi2 Dec 14, 2023
d0569a9
adding datetime format to helper function, updating created_at, modif…
mazzopardi2 Dec 14, 2023
998dbeb
refactored join table primary key values
mazzopardi2 Dec 15, 2023
4d0914b
using regex to match courts in grouplist to location code dict and ro…
mazzopardi2 Jan 3, 2024
225727d
fetching roles from grouplist table to get names and descriptions and…
mazzopardi2 Jan 3, 2024
3045110
adding logic around status in portal access and updating sql for port…
mazzopardi2 Jan 5, 2024
3afff5d
removing West Midlands region from list as it was duplicated
mazzopardi2 Jan 5, 2024
918d6b3
adding a new court to dictionary and adjusting the regex expression t…
mazzopardi2 Jan 5, 2024
f53988e
renaming court room
mazzopardi2 Jan 5, 2024
97ca2ef
refactoring court regions matching logic
mazzopardi2 Jan 5, 2024
7fc5784
updating court regions dict
mazzopardi2 Jan 5, 2024
c8df2a4
updating courtroom dict for rooms 1-3
mazzopardi2 Jan 9, 2024
5b2fbbb
adding in schema file
mazzopardi2 Jan 9, 2024
ae2177a
making a change to the court room names for rooms 1-3 - Bug S28-2280
mazzopardi2 Jan 9, 2024
939a71d
various changes around the bookings logic
mazzopardi2 Jan 10, 2024
82452d3
Update Swagger v2 Spec
jasonpaige Jan 10, 2024
dc37a52
adding two more date/time formats
mazzopardi2 Jan 10, 2024
1ce1c2d
adding two more date formats in helper function
mazzopardi2 Jan 10, 2024
a0c80a7
refactor booking participant migration to crosscheck participant ids
mazzopardi2 Jan 11, 2024
18fc6c9
fix date handling
mazzopardi2 Jan 11, 2024
b360cbc
removing print statements'
mazzopardi2 Jan 11, 2024
345d9ae
change in courtroom dict
mazzopardi2 Jan 12, 2024
9d6918b
enhance audit entry creation to include modified at timestamp
mazzopardi2 Jan 12, 2024
10b606d
adding in fields for capture sessions, refactored audit record function
mazzopardi2 Jan 15, 2024
90870d3
adding in constraint for capture session data fetch
mazzopardi2 Jan 15, 2024
57718d5
changing logic in bookings to allow for duplicate case ids to go through
mazzopardi2 Jan 16, 2024
2c1d736
change to dictionary re courtroom Kingston
mazzopardi2 Jan 16, 2024
aadc637
fixing logic for booking participant
mazzopardi2 Jan 16, 2024
4e1a0e0
amending cases table field mapping and adding function to fetch dates…
mazzopardi2 Jan 16, 2024
32d9f06
changes around bookings logic, capture_session status, schema changes…
mazzopardi2 Jan 17, 2024
3d99d94
changes to capture sessions to reflect correct urls and changes to au…
mazzopardi2 Jan 17, 2024
6afcfc5
adding logic to get started at and finished at dates for capture sess…
mazzopardi2 Jan 17, 2024
12c0939
adding in logic for share recordings
mazzopardi2 Jan 18, 2024
ab1ae55
updating logic for app access users
mazzopardi2 Jan 18, 2024
79b34f1
removing comments
mazzopardi2 Jan 18, 2024
e2fc6bc
reworking logic for app access and portal access
mazzopardi2 Jan 18, 2024
ea5427b
changes around audit entries updated
mazzopardi2 Jan 18, 2024
be8bfe3
making changes to modified_at dates in cases and bookings
mazzopardi2 Jan 18, 2024
8c9d9ce
changing logic to booking participant class
mazzopardi2 Jan 22, 2024
5c1bf59
making changes to recordings table logic
mazzopardi2 Jan 22, 2024
8c27ebf
adding in logic for instances where cases deleted have no audit trail
mazzopardi2 Jan 22, 2024
8d46b8c
moving the migrate manager for booking participants after the recordi…
mazzopardi2 Jan 23, 2024
7e9d787
remove 'unknown url' from any recording urls that aren't present
mazzopardi2 Jan 23, 2024
0674db4
removing unknown court string from courts, and any cases with no sche…
mazzopardi2 Jan 23, 2024
1fe666e
amending logic around share bookings
mazzopardi2 Jan 24, 2024
a41a126
Changed 'created_by' field in audits table to UUID from VARCHAR
mazzopardi2 Jan 24, 2024
ccd2261
logic change to app and portal access classes
mazzopardi2 Jan 24, 2024
20d924f
ignore log file
jasonpaige Jan 25, 2024
2b3c313
swagger
jasonpaige Jan 25, 2024
e6be279
Move migration scripts into flyway dir
jasonpaige Jan 25, 2024
13069ec
Remove unused schema file
jasonpaige Jan 25, 2024
f00e57b
2188: JPA Entity Aligned to New Migrations (#223)
lucas-phillips28 Jan 25, 2024
6f59b36
Update Swagger v2 Spec
jasonpaige Jan 25, 2024
c889896
Merge branch 'master' into 2188/data-mapping
jasonpaige Jan 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,4 @@ bin/main/application.yaml

applicationinsights-agent-*.jar
*.log
bin/migration/failed_imports_log.txt
108 changes: 108 additions & 0 deletions bin/migration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Database Migration Script

This script manages the migration of data from a source database to a destination database.


## How to Run the Script

1. **Set Environment Variables:** :
```
export SOURCE_DB_PASSWORD=<source_db_password>
export DESTINATION_DB_PASSWORD=<destination_db_password>
```

2. **Install Dependencies:** Install the required Python packages if not installed already:
```
pip install -r requirements.txt
```

3. **Execute the Script:** Run the migration script:
```
python main_script.py
```

4. **Test the migrated counts:** Run the summary script:
```
python summary.py
```

## Summary
The `summary.py` file provides an overview of database record counts for the source and destination dbs and count of failed imports

## DatabaseManager Class

### Methods:
- **`__init__(self, database, user, password, host, port)`:** Initializes the DatabaseManager class and establishes connections to databases.
- **`execute_query(self, query, params=None)`:** Executes queries and fetches results.
- **`close_connection(self)`:** Closes the database connections.

## Helper Functions

### `parse_to_timestamp(input_text)`
Parses date strings into UK timestamps, handling various date formats and returning the current time in the UK timezone if the input is invalid or empty.

### `check_existing_record(db_connection, table_name, field, record)`
Checks if a record exists in the database.

### `audit_entry_creation(db_connection, table_name, record_id, record, created_at=None, created_by="Data Entry")`
Creates an audit entry in the database for a new record.

### `log_failed_imports(failed_imports, filename='failed_imports_log.txt')`
Writes to failed_imports_log if record import fails

### `clear_migrations_file(filename='failed_imports_log.txt')`
Clears the failed imports log before the migration to avoid duplicate entries

## Main Logic

1. Initializes database connections.
2. Executes migration logic for each table manager.
3. Closes database connections.

## Table Managers

### RoomManager
Handles the migration of room data.

### UserManager
Manages the migration of user data.

### RoleManager
Manages user roles migration.

### CourtManager
Handles the migration of court-related data. An added 'Default Court' added for records with no data of which courts they're tried in.

### CourtRoomManager
Manages the migration of courtroom data.

### RegionManager
Manages the migration of region-related data.

### CourtRegionManager
Handles associations between courts and regions.

### PortalAccessManager
Manages user access to portals. The assumption is that Level 3 users have access to the Portal

### AppAccessManager
Handles user access to applications. The assumption is that all Roles except for Level 3 users have this access.

### CaseManager
Manages the migration of case-related data.

### BookingManager
Handles the migration of booking-related data.

### ParticipantManager
Manages the migration of participant-related data.

### BookingParticipantManager
Handles associations between bookings and participants.

### CaptureSessionManager
Manages the migration of capture session data.

### RecordingManager
Handles the migration of recording data.

24 changes: 24 additions & 0 deletions bin/migration/db_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import psycopg2

class DatabaseManager:
def __init__(self, database, user, password, host, port):
self.connection = psycopg2.connect(
database=database,
user=user,
password=password,
host=host,
port=port
)
self.cursor = self.connection

def execute_query(self, query, params=None):
self.cursor.execute(query, params)
return self.cursor.fetchall()

def close_connection(self):
self.cursor.close()
self.connection.close()




133 changes: 133 additions & 0 deletions bin/migration/main_script.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
import os
import time

from db_utils import DatabaseManager

from tables.rooms import RoomManager
from tables.users import UserManager
from tables.roles import RoleManager
from tables.courts import CourtManager
from tables.courtrooms import CourtRoomManager
from tables.regions import RegionManager
from tables.courtregions import CourtRegionManager
from tables.portalaccess import PortalAccessManager
from tables.appaccess import AppAccessManager
from tables.cases import CaseManager
from tables.bookings import BookingManager
from tables.participants import ParticipantManager
from tables.bookingparticipants import BookingParticipantManager
from tables.capturesessions import CaptureSessionManager
from tables.recordings import RecordingManager
from tables.sharebookings import ShareBookingsManager
from tables.audits import AuditLogManager

from tables.helpers import clear_migrations_file


# get passwords from env variables
source_db_password = os.environ.get('SOURCE_DB_PASSWORD')
destination_db_password = os.environ.get('DESTINATION_DB_PASSWORD')
test_db_password = os.environ.get('TEST_DB_PASSWORD')
staging_db_password = os.environ.get('STAGING_DB_PASSWORD')


# database connections
# staging db
# source_db = DatabaseManager(
# database="pre-pdb-stg",
# user="psqladmin",
# password=staging_db_password,
# host="pre-db-stg.postgres.database.azure.com",
# port="5432",
# )


# test db
# source_db = DatabaseManager(
# database="pre-pdb-test",
# user="psqladmin",
# password=test_db_password,
# host="pre-db-test.postgres.database.azure.com",
# port="5432",
# )

# demo database
source_db = DatabaseManager(
database="pre-pdb-demo",
user="psqladmin",
password=source_db_password,
host="pre-db-demo.postgres.database.azure.com",
port="5432",
)


# dummy database on dev server
destination_db = DatabaseManager(
database="dev-pre-copy",
user="psqladmin",
password=destination_db_password,
host="pre-db-dev.postgres.database.azure.com",
port="5432",
)

# managers for different tables
room_manager = RoomManager(source_db.connection.cursor())
user_manager = UserManager(source_db.connection.cursor())
role_manager = RoleManager(source_db.connection.cursor())
court_manager = CourtManager(source_db.connection.cursor())
courtroom_manager = CourtRoomManager()
region_manager = RegionManager()
court_region_manager = CourtRegionManager()
portal_access_manager = PortalAccessManager(source_db.connection.cursor())
app_access_manager = AppAccessManager(source_db.connection.cursor())
case_manager = CaseManager(source_db.connection.cursor())
booking_manager = BookingManager(source_db.connection.cursor())
participant_manager = ParticipantManager(source_db.connection.cursor())
booking_participant_manager = BookingParticipantManager(source_db.connection.cursor())
capture_session_manager = CaptureSessionManager(source_db.connection.cursor())
recording_manager = RecordingManager(source_db.connection.cursor())
share_bookings_manager = ShareBookingsManager(source_db.connection.cursor())
audit_log_manager = AuditLogManager(source_db.connection.cursor())

def migrate_manager_data(manager, destination_cursor):
start_time = time.time()
print(f"Migrating data for {manager.__class__.__name__}...")

if hasattr(manager, 'get_data') and callable(getattr(manager, 'get_data')):
source_data = manager.get_data()
manager.migrate_data(destination_cursor, source_data)
else:
manager.migrate_data(destination_cursor)

end_time = time.time()
time_taken = end_time - start_time
print(f"Data migration for {manager.__class__.__name__} complete in : {time_taken:.2f} seconds.\n")

def main():
clear_migrations_file()

destination_db_cursor = destination_db.connection.cursor()

migrate_manager_data(room_manager, destination_db_cursor)
migrate_manager_data(user_manager, destination_db_cursor)
migrate_manager_data(role_manager, destination_db_cursor)
migrate_manager_data(court_manager, destination_db_cursor)
migrate_manager_data(courtroom_manager, destination_db_cursor)
migrate_manager_data(region_manager, destination_db_cursor)
migrate_manager_data(court_region_manager, destination_db_cursor)
migrate_manager_data(portal_access_manager, destination_db_cursor)
migrate_manager_data(app_access_manager, destination_db_cursor)
migrate_manager_data(case_manager, destination_db_cursor)
migrate_manager_data(booking_manager, destination_db_cursor)
migrate_manager_data(participant_manager, destination_db_cursor)
migrate_manager_data(capture_session_manager, destination_db_cursor)
migrate_manager_data(recording_manager, destination_db_cursor)
migrate_manager_data(booking_participant_manager, destination_db_cursor)
migrate_manager_data(share_bookings_manager, destination_db_cursor)
migrate_manager_data(audit_log_manager, destination_db_cursor)

source_db.close_connection()
destination_db.close_connection()

if __name__ == "__main__":
main()
13 changes: 13 additions & 0 deletions bin/migration/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
attrs==22.2.0
importlib-metadata==4.8.3
iniconfig==1.1.1
packaging==21.3
pluggy==1.0.0
psycopg2==2.9.8
py==1.11.0
pyparsing==3.1.1
pytest==7.0.1
pytz==2023.3.post1
tomli==1.2.3
typing_extensions==4.1.1
zipp==3.6.0
104 changes: 104 additions & 0 deletions bin/migration/summary.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
import psycopg2
import os

# Connection
source_db_password = os.environ.get('SOURCE_DB_PASSWORD')
destination_db_password = os.environ.get('DESTINATION_DB_PASSWORD')

destination_conn = psycopg2.connect(
database="dev-pre-copy",
user="psqladmin",
password=destination_db_password,
host="pre-db-dev.postgres.database.azure.com",
port="5432",
)

source_conn = psycopg2.connect(
database="pre-pdb-demo",
user="psqladmin",
password=source_db_password,
host="pre-db-demo.postgres.database.azure.com",
port="5432",
)

# table mapping from old db table names to new
table_mapping = {
'recordings': 'recordings',
'share_recordings' : 'share_recordings',
'portal_access' : 'portal_access',
'audits': 'audits',
'courts': 'courts',
'court_region':'court_region',
'regions':'regions',
'courtrooms':'courtrooms',
'rooms':'rooms',
'contacts': 'participants',
'bookings':'bookings',
'cases': 'cases',
'booking_participant':'booking_participant',
'roles':'roles',
'role_permission':'role_permission',
'permissions': 'permissions',
'users': 'users',
'app_access':'app_access',
'capture_sessions':'capture_sessions'
}

# Counts the number of records in all tables in a provided db connection
def count_records_in_all_tables(connection):
cursor = connection.cursor()

cursor.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_type = 'BASE TABLE'")
tables = cursor.fetchall()

table_counts = {}
for table in tables:
table_name = table[0]
cursor.execute(f"SELECT COUNT(*) FROM public.{table_name}")
count = cursor.fetchone()[0]
table_counts[table_name] = count

cursor.close()
return table_counts

# Parses the failed imports log file to count the number of failed imports for each tables.
def count_failed_imports(file_path):
table_counts = {}

with open(file_path, 'r') as file:
for line in file:
split_line = line.split(', ')
if len(split_line) >= 2:
table_name = split_line[0].split(': ')[1].strip()

if table_name in table_counts:
table_counts[table_name] += 1
else:
table_counts[table_name] = 1
return table_counts

source_table_counts = count_records_in_all_tables(source_conn)
destination_table_counts = count_records_in_all_tables(destination_conn)

file_path = 'failed_imports_log.txt'
failed_imports = count_failed_imports(file_path)

# Displays the record counts in both source and destination db and the failed logs. This is to monitor for data loss.
def print_summary(source_counts, destination_counts, failed_imports):
print(f"| {'Table Name'.ljust(20)} | {'Source DB Records'.ljust(18)} | {'Destination DB Records'.ljust(26)} | {'Failed Imports Logs'.ljust(19)} ")
print(f"| {'------------'.ljust(20)} | {'------------------'.ljust(18)} | {'------------------'.ljust(26)} | {'---------------'.ljust(19)} ")

for source_table, destination_table in table_mapping.items():
source_records = source_counts.get(source_table, '-')
destination_records = destination_counts.get(destination_table, '-')
failed_import_count = failed_imports.get(source_table, '-')

print(f"| {destination_table.ljust(20)} | {str(source_records).ljust(18)} | {str(destination_records).ljust(26)} | {str(failed_import_count).ljust(19)}")

print_summary(source_table_counts, destination_table_counts, failed_imports)

source_conn.close()
destination_conn.close()



Loading