You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As an engineer, it's confusing to have code that isn't being used or doesn't fully achieve its purpose.
The quality tests were originally added as a way to determine if new transformers were of a lower quality than existing ones. In these tests, quality was defined as how well a transformer captures relationships between columns. To test this, the following steps are taken:
It creates a list of test cases. Each test case has a dataset and a set of sdtypes to test for the dataset.
A dictionary is created mapping sdtypes to a DataFrame containing the regression scores obtained from running the
transformers of that sdtype against the datasets in the test cases. Each row in the DataFrame has the transformer name,
dataset name, column name and score. The scores are computed as follows:
For every transformer of the sdtype, transform all the columns of that sdtype.
For every numerical column in the dataset, the transformed columns are used as features to train a regression model.
The score is the coefficient of determination obtained from that model trying to predict the target column.
Once the scores are gathered, a results table is created. Each row has a transformer name, dataset name, average score for the dataset and a score comparing the transformer's average score for the dataset to the average of the average score for the dataset across all transformers of the same sdtype.
For every unique transformer in the results, a test is run to check that the transformer's score for each table is either higher than the threshold, or the comparative score is higher than the threshold.
The problem is that the tests haven't been used as intended. A lot of our transformers don't do a very good job of capturing information that can be used to predict other columns as that isn't necessarily what they were meant to do independently. The only transformer that seems to do a very good job is OneHotEncoder, and that makes sense since it is actually making a new column for every unique value of a category. If there are correlations between these categories and other columns, it can capture this.
In order to get the tests to pass for the other transformers, we just made the threshold for the score very low. This doesn't seem to be a very effective way to test quality. If the goal was to catch new transformers that had bad "quality", this doesn't work well because it would probably pass the tests anyway.
We removed the quality tests from the github workflows because we temporarily lost access to the datasets it was using. The question moving forward is if we should just remove the code altogether since it isn't being used.
Expected behavior
Investigate if there is any code in the quality test package that should be kept or provides some value.
Problem Description
As an engineer, it's confusing to have code that isn't being used or doesn't fully achieve its purpose.
The quality tests were originally added as a way to determine if new transformers were of a lower quality than existing ones. In these tests, quality was defined as how well a transformer captures relationships between columns. To test this, the following steps are taken:
sdtypes
to test for the dataset.sdtypes
to a DataFrame containing the regression scores obtained from running thetransformers of that
sdtype
against the datasets in the test cases. Each row in the DataFrame has the transformer name,dataset name, column name and score. The scores are computed as follows:
sdtype
, transform all the columns of thatsdtype
.sdtype
.The problem is that the tests haven't been used as intended. A lot of our transformers don't do a very good job of capturing information that can be used to predict other columns as that isn't necessarily what they were meant to do independently. The only transformer that seems to do a very good job is
OneHotEncoder
, and that makes sense since it is actually making a new column for every unique value of a category. If there are correlations between these categories and other columns, it can capture this.In order to get the tests to pass for the other transformers, we just made the threshold for the score very low. This doesn't seem to be a very effective way to test quality. If the goal was to catch new transformers that had bad "quality", this doesn't work well because it would probably pass the tests anyway.
We removed the quality tests from the github workflows because we temporarily lost access to the datasets it was using. The question moving forward is if we should just remove the code altogether since it isn't being used.
Expected behavior
quality
test package that should be kept or provides some value.Additional context
https://github.com/sdv-dev/RDT/blob/master/tests/quality/test_quality.py#L27
The text was updated successfully, but these errors were encountered: