This is a framework to help to check consistency of database data between two different tables whether or not they are in the same database.
The steps are as follow:
- Configure database connection(s)
- Mapping tables columns
- Configure uinique identifiers (UIDs)
- Filters
- Functions
To compare two tables you need first to create the Database Configuration to access the database. If you are comparing tables from two distinct databases you will need to create two configurations.
To create a Database Configuration you use the class dba.DBConfiguration
.
from dba import DBConfiguration
config = DBConfiguration()
config.usr = "dbuser"
config.pwd = "dbpassword"
config.host = "dbip"
config.databse = "dbname"
To check whether or not the columns have the same value we must create a map from the first table columns to the second table columns.
The Mirror
uses an array of mirror.FieldLink
object to do the mapping.
Let's consider we have the followig structure in the tables we want to compare:
id | name | gender |
---|---|---|
1 | John Doe | M |
2 | Peter Chan | M |
3 | Catherine Mustache | F |
id | full_name | gender |
---|---|---|
1 | John Doe | Male |
2 | Peter Chan | Male |
3 | Catherine Mustache | Female |
The configuration for a table like this would be:
from dba import FieldLink
links = [
FieldLink("id", "id"),
FieldLink("name", "full_name"),
FieldLink("gender", "gender")
]
To successfully compare the data from the tables, the Mirror
must know what makes a record in your table unique and these values must match in both tables.
In this example we only have the
id
column to distinguish the records. Its preferrable to avoid using it. If not possible you need to make sure theid
values match for each record. Preferrably use fields that are unique but there is no chance that they will cahnge from one table to another likeusername
or
To do this you just need to set the property uid
to True
in the field link constructor.
links = [
FieldLink("id", "id", uid=True),
FieldLink("name", "full_name"),
FieldLink("gender", "gender")
]
You can also have more than one field as uid
e.g.:
links = [
FieldLink("id", "id", uid=True),
FieldLink("email", "email", uid=True),
FieldLink("name", "full_name"),
FieldLink("gender", "gender")
]
What we are looking for here is to find what makes that record unique. We can use the id only if it matches in both databases.
When comparing data from two distinct tables you may want to compare just a small dataset of the whole table.
For this you can use the filterr
property when initializing your link.
Let's say you only want to compare records of the Male gender:
links = [
FieldLink("id", "id", uid=True),
FieldLink("name", "full_name"),
FieldLink("gender", "gender", filterr=True, filter1_val="M", filter2_val="Male")
]
You may have noted that the value of gender
in both tables are different. In this case we will always have a negative when comparing this two tables.
To be able to get a positive when comparing these values you can create a custom function that will take the column value as an argument and pass it in the FieldLink
constructor.
def gender_parser(val):
if val == "M":
return "Male"
elif val == "F":
return "Female"
else:
return val
links = [
FieldLink("id", "id", uid=True),
FieldLink("name", "full_name"),
FieldLink("gender", "gender", func1=gender_parser)
]
You also have the option of parsing the values of the second table using the
func2
argument. This is very unlikely to be used but it is there in case both values need to be converted to something else for comparison.
After creating the DBConfigurations and all the neccessary links we need the create the Mirror
object which is who does all the work.
# Tables are in the same database since we are using the same configuration.
mirror = Mirror(config, config, "Person", "person", links)
# What actually does all the stuff
mirror.run_diff()
# Export all the unmatched found to csv file in current directory
mirror.to_csv()