-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guidance on Fairness and Robust Check in AIverify #226
Comments
Hi @imda-benedictlee, Good day, could I know why I couldn't complete the robust check properly? Is my setting for robust check correct? Thank you! |
Hi @YZx0pa, thank you for raising this issue and my apologies for the delay in response. Rest assure that we have been looking into this issue. We will get back to you with an answer soon. |
Hi @YZx0pa, I have spoken to the developer. They have requested for the test-engine-app docker container logs. To get the logs, start by doing the following:
|
Hi @YZx0pa my apologies for the delayed response. Our Fairness Metrics Toolbox on AI Verify currently requires the use case to have an identified sensitive feature(s) in order to identify the most relevant fairness metric and generate the confusion matrix (TP, FP, TN, FN). Since your model training does not directly involve a sensitive feature, you could try the following method to identify any unintended bias from indirect features by comparing fairness results of two models.
Analyse the fairness results of the two models from the two reports generated. If the two models have similar fairness results, removing the sensitive attribute from the training data did not affect the fairness of the model. Hence, there might be a direct correlation between the sensitive feature and the suspected indirect feature. To validate this outside of AI Verify, you can further run an equality inference test for each suspected indirect feature on the first model. i.e. Several testing points where all features except for the suspected indirect feature are assigned the same value. If the prediction for these points vary, there is possibility of a bias leakage here. |
Hi @imda-benedictlee , thanks for your advice and sorry for the late reply! I faced the below error when |
Hi @kimeetok , thank you for your guidance, and I apologize for the delayed response. While the method you recommended seems like a valid approach for fairness verification, our client specifically requests a directly generated AI Verify report to demonstrate the fairness of our current model with respect to gender. I'm exploring whether there is a simpler way for us to generate this fairness report. For instance, is it possible to allow us to proceed with the fairness check based on the model's prediction results (with or without sensitive features), along with the corresponding sensitive feature information? Additionally, are there any concerns about directly assessing fairness based on the model's prediction results and sensitive features? |
Hi @YZx0pa, I did some research on the issue that you faced. This issue could be due to some corruption of the Docker logs. You can take a look at the issue and solution at this link: https://copyprogramming.com/howto/docker-error-grabbing-logs-invalid-character-x00-looking-for-beginning-of-value. However, as I have not encounter this issue before, I cannot guarantee that the solution in the link given will work for you. Nevertheless, if you are able to delete the container and recreate them again, I would suggest going through that route instead and try replicate the issue, then use the Docker Follow commands stated previously to track the issue. Do reach out to me if you require additional help. |
Hi @imda-benedictlee , thanks a lot for your patient guidance! For my case, I solved the issue by using commands "docker compose down" and "docker compose up", hopefully it's the log files required. Please see the file attached for the copy out logs, please let me know if there is still some problem or more info need to provide. Thanks! |
hello @YZx0pa, thanks for providing us the log file. It seems like the sensitive_feature "gender_id" is not in the dataset. "gender_id" should be a column in the dataset that you are using. Perhaps is it possible to send a copy of the dataset that you are testing to us so that we can test it on our end? If you are sending us your dataset, remember to remove any sensitive information. Thanks! |
Hi @kimeetok, thanks for helping on my case, could I have your email address, so I send you the sample data by email, and do I need to provide the model also? Thanks! |
Hi @YZx0pa , yes we will need the model too. Please send them to [email protected] and [email protected] Regarding having a more direct way to check fairness with regards to indirect sensitive features, you can explore creating a new test that will take in additional information needed (another model/ dataset file, more test arguments etc) to do the required calculations. Our developer tools can help you create this new test plugin: https://github.com/IMDA-BTG/aiverify-developer-tools |
Hello @YZx0pa, I have tried running your model and dataset and it seems like there are two issues:
|
Hi @imda-kelvinkok , Thanks for your help on my case! Please see my reply below:
""" |
Hello @YZx0pa, Thanks for retraining the model. It seems there are a few issues while running the AI Verify report so I ran the algorithms separately using your configuration to test fairness, explainability and robustness (with both your old and new XGBoost models) and here's what I've done and observed: Fairness Metrics Toolbox for Classification
Robustness
SHAP
We can try to your old model (1.6.1) but you will have to downgrade the version of your XGBoost Python library to that version as well. If we use 1.7.4, I am not sure if there will be other compatibility issues so the most straightforward way is to downgrade the Python library version. Can you provide me the test dataset with the sensitive feature |
Hi @imda-kelvinkok I've sent you the updated model and data to your email as per your request last Friday, please let me know if there is more info required. Thanks! |
hello @YZx0pa, I've sent you an email yesterday not sure if you have received it. Let us know if you have further enquiries. Thanks! |
Hi @imda-kelvinkok Good morning! Thanks for advising, I've received your email, and I downloaded the files and used them to generate the AI verify report, it seems the same issue to me, only pass the shap check, and both robust and fairness checks failed. |
hi @imda-kelvinkok I appreciate your expert assistance with the Aiverify APP Docker setup and source code installation. Thanks to your guidance, we can now generate AI verify reports using the trained model with gender_id. |
Hello @kimeetok , appreciate your advice. We will continue to explore the issue as per your advice. |
Hi @imda-kelvinkok , could I check after I uploaded the dataset, and it turned out to be failed, is there a way to remove it from my datasets records? Thanks! |
hi @YZx0pa, yes from the main page, go to
|
Hi @imda-kelvinkok Got it, great thanks! |
Is there an existing issue for this?
Description
Currently, I've tested my own dataset (tabular datasets with xgboost version1.6.1 binary classification). SHAP Toolbox shows "Test completed", but both robust and fairness have "test error". And only see "Application error" in the report content when clicking "view report".
For the fairness check, I want to clarify that our model training does not involve a directly sensitive feature related to gender. However, I am concerned about the potential for unintended bias from indirect features influencing gender-related outcomes. I would appreciate guidance on how to conduct a fairness check in AIverify considering this scenario.
Regarding the robust check, I specified the "annotated ground truth path" as "/app/aiverify/uploads/data/test_with_groundtruth.sav" (the same dataset used for the SHAP Toolbox). I also set the "name of column contains image files" as "NA."
Thank you!
Current Behavior
Not able to generate the report
Expected Behavior
Able to complete test for robust and fairness toolbox and able to generate the summary report
Steps To Reproduce
NA
Environment
Screenshots/ Code snippets
Robust toolbox log:
2023-11-29 22:56:42,019 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:run_task_processing_in_process(66)]: The task validation is successful: task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3
2023-11-29 22:56:42,020 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:run_task_processing_in_process(71)]: Working on task: message_id 1701269801989-0, message_args {"mode":"upload","id":"task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3","algorithmId":"algo:aiverify.stock.robustness_toolbox:robustness_toolbox","algorithmArgs":{"annotated_ground_truth_path":"/app/aiverify/uploads/data/test_with_groundtruth.sav","file_name_label":"NA"},"testDataset":"/app/aiverify/uploads/data/test_with_groundtruth.sav","modelFile":"/app/aiverify/uploads/model/xgb_model_check.pkl","groundTruthDataset":"/app/aiverify/uploads/data/test_with_groundtruth.sav","modelType":"classification","groundTruth":"shortlisted"}, task_type: TaskType.NEW
2023-11-29 22:56:42,020 [INFO][app_logger.py::add_to_log(116)] [task_result.py:set_status(146)]: The current task has changed status to TaskStatus.RUNNING
2023-11-29 22:56:42,020 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:process_new_task(110)]: Sending task update
2023-11-29 22:56:42,024 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:detect_pipeline(30)]: Attempting to detect pipeline model from the given path
2023-11-29 22:56:42,024 [INFO][log_utils.py::log_message(31)] [pipeline_manager.py:read_pipeline_path(55)]: Attempting to read pipeline: /app/aiverify/uploads/model/xgb_model_check.pkl
2023-11-29 22:56:42,024 [INFO][log_utils.py::log_message(31)] [pipeline_manager.py:read_pipeline_path(83)]: Pipeline validation successful
2023-11-29 22:56:42,025 [ERROR][log_utils.py::log_message(37)] [pipeline_manager.py:read_pipeline_path(101)]: There was an error getting pipeline files in the folder
2023-11-29 22:56:42,025 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:_load_instances(277)]: Unable to find pipeline model. Loading non-pipeline instances
2023-11-29 22:56:42,025 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(56)]: Attempting to read data: /app/aiverify/uploads/data/test_with_groundtruth.sav
2023-11-29 22:56:42,026 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(81)]: Data validation successful
2023-11-29 22:56:42,026 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(89)]: Attempting to deserialize data: /app/aiverify/uploads/data/test_with_groundtruth.sav
2023-11-29 22:56:42,027 [INFO][app_logger.py::add_to_log(116)] [task.py:_send_task_update(278)]: Task has received notification to send task update
2023-11-29 22:56:42,029 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(157)]: Consolidation results: {pandasdata.Plugin object at 0x7f707c349b50} {class 'pickleserializer.Plugin'}
2023-11-29 22:56:42,029 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(181)]: Supported data format: DataPluginType.PANDAS[SerializerPluginType.PICKLE]
2023-11-29 22:56:42,029 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:load_data(75)]: Data Instance: {pandasdata.Plugin object at 0x7f707c349b50}, Data Deserializer: SerializerPluginType.PICKLE
2023-11-29 22:56:42,029 [INFO][app_logger.py::add_to_log(116)] [worker.py:_send_update(614)]: The update sent successfully: task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3:{'type': 'TaskResponse', 'status': 'Running', 'elapsedTime': 0, 'startTime': '2023-11-29T22:56:42.019462', 'output': '""', 'logFile': '/app/aiverify/test-engine-app/logs/task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3.log', 'taskProgress': 0}
2023-11-29 22:56:42,030 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(51)]: Attempting to read model: /app/aiverify/uploads/model/xgb_model_check.pkl
2023-11-29 22:56:42,030 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(78)]: Model validation successful
2023-11-29 22:56:42,030 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(84)]: Attempting to deserialize model: /app/aiverify/uploads/model/xgb_model_check.pkl
2023-11-29 22:56:42,134 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(107)]: Attempting to identify model format: {class 'xgboost.sklearn.XGBClassifier'}
2023-11-29 22:56:42,134 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(117)]: Supported model format: {class 'xgboost.sklearn.XGBClassifier'}, ModelPluginType.XGBOOST[xgboost.sklearn.XGBClassifier]
2023-11-29 22:56:42,134 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:load_model(201)]: Model Instance: {xgboostmodel.Plugin object at 0x7f6fd5dcd510}, Model Deserializer: SerializerPluginType.PICKLE
2023-11-29 22:56:42,135 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(56)]: Attempting to read data: /app/aiverify/uploads/data/test_with_groundtruth.sav
2023-11-29 22:56:42,135 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(81)]: Data validation successful
2023-11-29 22:56:42,135 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(89)]: Attempting to deserialize data: /app/aiverify/uploads/data/test_with_groundtruth.sav
2023-11-29 22:56:42,139 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(157)]: Consolidation results: {pandasdata.Plugin object at 0x7f6fd02a1650} {class 'pickleserializer.Plugin'}
2023-11-29 22:56:42,139 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(181)]: Supported data format: DataPluginType.PANDAS[SerializerPluginType.PICKLE]
2023-11-29 22:56:42,139 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:load_ground_truth(259)]: GroundTruth Instance: {pandasdata.Plugin object at 0x7f6fd02a1650}, GroundTruth Deserializer: SerializerPluginType.PICKLE
2023-11-29 22:56:42,159 [ERROR][log_utils.py::log_message(37)] [algorithm_manager.py:get_algorithm(126)]: There was an error getting algorithm instance (not found): algo:aiverify.stock.robustness_toolbox:robustness_toolbox
2023-11-29 22:56:42,160 [INFO][app_logger.py::add_to_log(116)] [plugin_controller.py:get_plugin_instance(112)]: Attempting to find algo:aiverify.stock.robustness_toolbox:robustness_toolbox in the algorithm registry
2023-11-29 22:56:42,163 [INFO][app_logger.py::add_to_log(116)] [plugin_controller.py:get_plugin_instance(141)]: algo:aiverify.stock.robustness_toolbox:robustness_toolbox is in the algorithm registry. Attempting to re-discover algorithm
2023-11-29 22:56:42,176 [INFO][robustness_toolbox.py::add_to_log(205)] Setup completed
2023-11-29 22:56:42,179 [INFO][log_utils.py::log_message(31)] [algorithm_manager.py:get_algorithm(116)]: Supported algorithm: algo:aiverify.stock.robustness_toolbox:robustness_toolbox, PluginType.ALGORITHM
2023-11-29 22:56:42,180 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:load_algorithm(356)]: Algorithm Instance: {robustness_toolbox.Plugin object at 0x7f6fd02aac10}
2023-11-29 22:56:42,205 [INFO][app_logger.py::add_to_log(116)] [task_result.py:set_progress(114)]: The current task completion progress is 100
2023-11-29 22:56:42,207 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:process_new_task(147)]: The raw task results: {'results': [0]}
2023-11-29 22:56:42,208 [INFO][app_logger.py::add_to_log(116)] [task.py:_send_task_update(278)]: Task has received notification to send task update
2023-11-29 22:56:42,210 [ERROR][app_logger.py::add_to_log(126)] [task_processing.py:process_new_task(167)]: Failed output schema validation: Task Results: {'results': [0]}
2023-11-29 22:56:42,210 [WARNING][app_logger.py::add_to_log(121)] [task_processing.py:process_new_task(189)]: The task terminated: The algorithm output schema validation failed
2023-11-29 22:56:42,210 [INFO][app_logger.py::add_to_log(116)] [worker.py:_send_update(614)]: The update sent successfully: task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3:{'type': 'TaskResponse', 'status': 'Running', 'elapsedTime': 0, 'startTime': '2023-11-29T22:56:42.019462', 'output': '""', 'logFile': '/app/aiverify/test-engine-app/logs/task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3.log', 'taskProgress': 100}
2023-11-29 22:56:42,211 [INFO][app_logger.py::add_to_log(116)] [task_result.py:set_progress(114)]: The current task completion progress is 100
2023-11-29 22:56:42,211 [INFO][app_logger.py::add_to_log(116)] [task_result.py:set_status(146)]: The current task has changed status to TaskStatus.ERROR
2023-11-29 22:56:42,232 [INFO][app_logger.py::add_to_log(116)] [worker.py:_send_update(614)]: The update sent successfully: task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3:{'type': 'TaskResponse', 'status': 'Error', 'elapsedTime': 0, 'startTime': '2023-11-29T22:56:42.019462', 'output': '', 'errorMessages': '[{"category": "SYSTEM_ERROR", "code": "CSYSx00146", "description": "Task Terminated: The algorithm output schema validation failed", "severity": "warning", "component": "task_processing.py"}]', 'logFile': '/app/aiverify/test-engine-app/logs/task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3.log', 'taskProgress': 100}
2023-11-29 22:56:42,232 [INFO][app_logger.py::add_to_log(116)] [worker.py:_send_acknowledgement(659)]: The acknowledgement sent successfully - 1701269801989-0
2023-11-29 22:56:42,232 [INFO][app_logger.py::add_to_log(116)] [task.py:cleanup(104)]: The system has received notification to clean up task
Additional Context
NA
Possible Solution (Optional)
No response
The text was updated successfully, but these errors were encountered: