[attr_analyst] Attributing changes in target metrics to related metrics and calculating the contribution values of the related metrics.
- Flexible relationship configuration: Support for complex nested attribution relationship configuration via configuration files.
- Support multiple attribution models: Allow for the free combination of additive and multiplicative attribution models.
- Output of attribution process and results: Provide output of intermediate attribution results for flexible application of hierarchical attribution results.
# PyPI
pip install attr_analyst
test_relation_config_file.json
Attribution target index: amt The final result of attribution is to attribute the change value of the target index to the contribution of the change in the related index.
Attribution dimension index: store_name, category_name According to these index values, perform data association between the current DataFrame and the comparative DataFrame to calculate the change value.
Attribution related index: x1, x2, x3, x4, x5, x6
{
"label_column": "amt",
"dimension_columns": ["store_name", "category_name"],
"relations": {
"indexes": [
{
"indexes": ["x1", "x2"],
"relation": "+"
},
"x3",
{
"indexes": [
{
"indexes": ["x4", "x5"],
"relation": "*"
},
"x6"
],
"relation": "+"
}
],
"relation": "*"
}
}
target_df and compare_df should have the same column names and quantities.
# Explanation of core methods
def calculate_attr_from_config(
target_df: pd.DataFrame,
compare_df: pd.DataFrame,
relation_config_path: str
) -> tuple[pd.DataFrame, pd.DataFrame, float, float, float] :
"""
Calculates the attributes between the target DataFrame and the comparison DataFrame according to the configuration file.
Parameters:
target_df (pd.DataFrame): The target DataFrame.
compare_df (pd.DataFrame): The comparison DataFrame.
relation_config_path (str): The path of the relationship configuration file.
Returns:
tuple: A tuple containing the following:
- attr_output_df (pd.DataFrame): A DataFrame containing the final attribution calculation results, with the suffix "_c" for the corresponding index attribution results.
- attr_progress_df (pd.DataFrame): A DataFrame containing the attribution results of the intermediate calculation process.
- label_total_target (float): The total result of the attribution target in the current period.
- label_total_compare (float): The total result of the attribution target in the comparison period.
- label_total_rate (float): The rate of change of the attribution target.
"""
relation_config = read_relation_config(relation_config_path)
label_column = relation_config['label_column']
dimension_columns = relation_config['dimension_columns']
relations = relation_config['relations']
return calculate_attr(target_df, compare_df, label_column, dimension_columns, relations)
from attr_analyst import calculate_attr_from_config
import pandas as pd
relation_config_filepath = 'test_relation_config_file.json'
target_data = {
'x1': [1, 2, 3],
'x2': [4, 5, 6],
'x3': [7, 8, 9],
'x4': [1, 2, 3],
'x5': [4, 5, 6],
'x6': [4, 5, 6],
'amt': [1, 2, 3],
'store_name': ['a', 'b', 'c'],
'category_name': ['a', 'b', 'c']
}
target_df = pd.DataFrame(target_data)
compare_data = {
'x1': [2, 4, 6],
'x2': [5, 5, 8],
'x3': [8, 10, 9],
'x4': [2, 4, 6],
'x5': [4, 7, 9],
'x6': [3, 2, 4],
'amt': [5, 5, 4],
'store_name': ['a', 'b', 'c'],
'category_name': ['a', 'b', 'c']
}
compare_df = pd.DataFrame(compare_data)
attr_output_df, attr_progress_df, label_total_target, label_total_compare, label_total_rate = calculate_attr_from_config(target_df, compare_df, relation_config_filepath)
attr_output_df.to_excel('test_calculate_attr_from_config_output.xlsx', index=False)