Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sector and income #230

Merged
merged 29 commits into from
Oct 27, 2023
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
de623fd
Add SectorPrevalence to KevinHall.json config.
jamesturner246 Oct 20, 2023
d638cef
Load Rural Prevalence into model parser from JSON.
jamesturner246 Oct 20, 2023
585e936
Load Rural Prevalence into KH model.
jamesturner246 Oct 20, 2023
1922906
We want all our RF models' generate_ methods to run.
jamesturner246 Oct 20, 2023
3a6534e
Add stub initialise_sector method to KH model class.
jamesturner246 Oct 20, 2023
e5a312c
Add enum type for sector (region).
jamesturner246 Oct 20, 2023
87c4eac
Add 'unknown' sector enum type, and inititalise people to unknown sec…
jamesturner246 Oct 20, 2023
5f9a7fe
Finish initialise_sector code in KevinHall.
jamesturner246 Oct 20, 2023
eb35572
Clarify that we are using rural prevalence here.
jamesturner246 Oct 20, 2023
c404c0a
Don't std move trivial type
jamesturner246 Oct 20, 2023
f4d6638
model parser clang-tidy.
jamesturner246 Oct 23, 2023
b4d72ab
Add Income model parameters to KevinHall config JSON.
jamesturner246 Oct 23, 2023
d7f5b61
Load income models from JSON in mdoel parser.
jamesturner246 Oct 23, 2023
7c0846f
Load income models from mdoel parser into kevin hall model.
jamesturner246 Oct 23, 2023
1aea169
Add missing category_1 income model.
jamesturner246 Oct 24, 2023
3c6ad90
LinearModelParams should be a vector.
jamesturner246 Oct 24, 2023
c72c2bc
Add initialise_income to kevin hall model.
jamesturner246 Oct 24, 2023
61ab5b7
Fix read after std::move.
jamesturner246 Oct 25, 2023
06e553e
Call initialise_income in KH model.
jamesturner246 Oct 25, 2023
e373e53
Add sector update code to kevinhall.
jamesturner246 Oct 25, 2023
f5a22d2
Code to convert sector to a real, for use as a coefficient. Add error…
jamesturner246 Oct 25, 2023
2326ca9
Age group name consistency.
jamesturner246 Oct 25, 2023
3985efa
Add person method to check over 18.
jamesturner246 Oct 25, 2023
905223a
Add kevinhall code to update income.
jamesturner246 Oct 25, 2023
71926bd
Change weird if
jamesturner246 Oct 26, 2023
d32fd44
nutrient_ranges is now a Double interval, interval now has its own cl…
jamesturner246 Oct 26, 2023
a44f19c
KH: rural_prevalence should be an unordered_map.
jamesturner246 Oct 26, 2023
0ab8ef6
KH: use vector reserve, rather than letting it zero init.
jamesturner246 Oct 26, 2023
1ebaa1a
KH: push_back, don't assign, into reserved vector.
jamesturner246 Oct 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions example_new/KevinHall.json
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,43 @@
"Description": "string"
}
},
"RuralPrevalence": [
{
"Name": "Under18",
"Female": 0.65,
"Male": 0.65
},
{
"Name": "Over18",
"Female": 0.6,
"Male": 0.6
}
],
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
"IncomeModels": [
{
"Name": "Category_1",
"Intercept": 1.0,
"Coefficients": {}
},
{
"Name": "Category_2",
"Intercept": 0.75498278113169,
"Coefficients": {
"Gender": 0.0204338261883629,
"Over18": 0.0486699373325097,
"Sector": -0.706477682886734
}
},
{
"Name": "Category_3",
"Intercept": 1.48856946873517,
"Coefficients": {
"Gender": 0.000641255498596025,
"Over18": 0.206622749570132,
"Sector": -1.71313287940798
}
}
],
"AgeMeanHeight": {
"Male": [
0.498, 0.757, 0.868, 0.952, 1.023, 1.092, 1.155, 1.219, 1.28, 1.333,
Expand Down
88 changes: 64 additions & 24 deletions src/HealthGPS.Console/model_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -129,31 +129,50 @@ std::unique_ptr<hgps::StaticLinearModelDefinition>
load_staticlinear_risk_model_definition(const poco::json &opt, const host::Configuration &config) {
MEASURE_FUNCTION();

// Risk factor linear models.
hgps::LinearModelParams models;
for (const auto &factor : opt["RiskFactorModels"]) {
auto factor_key = factor["Name"].get<hgps::core::Identifier>();
models.intercepts[factor_key] = factor["Intercept"].get<double>();
models.coefficients[factor_key] =
factor["Coefficients"].get<std::unordered_map<hgps::core::Identifier, double>>();
}

// Risk factor names and correlation matrix.
std::vector<hgps::core::Identifier> names;
// Risk factor linear models and correlation matrix.
std::vector<hgps::LinearModelParams> risk_factor_models;
const auto correlations_file_info =
host::get_file_info(opt["RiskFactorCorrelationFile"], config.root_path);
const auto correlations_table = load_datatable_from_csv(correlations_file_info);
Eigen::MatrixXd correlations{correlations_table.num_rows(), correlations_table.num_columns()};
for (size_t col = 0; col < correlations_table.num_columns(); col++) {
names.emplace_back(correlations_table.column(col).name());
for (size_t row = 0; row < correlations_table.num_rows(); row++) {
correlations(row, col) =
std::any_cast<double>(correlations_table.column(col).value(row));

for (size_t i = 0; i < opt["RiskFactorModels"].size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debatable as to whether this would be an improvement, but FYI you can zip ranges together in C++20, as you would with Python: https://en.cppreference.com/w/cpp/ranges/zip_view

In this case you could zip opt["RiskFactorModels"] and correlations_table (which is also a range because it has cbegin and cend methods) and then you avoid defining i.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be a C++23 feature :(

// Risk factor model.
const auto &factor = opt["RiskFactorModels"][i];
hgps::LinearModelParams model;
model.name = factor["Name"].get<hgps::core::Identifier>();
model.intercept = factor["Intercept"].get<double>();
model.coefficients =
factor["Coefficients"].get<std::unordered_map<hgps::core::Identifier, double>>();

// Check correlation matrix column name matches risk factor name.
auto column_name = hgps::core::Identifier{correlations_table.column(i).name()};
if (model.name != column_name) {
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
throw hgps::core::HgpsException{
fmt::format("Risk factor {} name ({}) does not match correlation matrix "
"column {} name ({})",
i, model.name.to_string(), i, column_name.to_string())};
}

// Write data structures.
for (size_t j = 0; j < correlations_table.num_rows(); j++) {
correlations(i, j) = std::any_cast<double>(correlations_table.column(i).value(j));
}
risk_factor_models.emplace_back(std::move(model));
}

// Check correlation matrix column count matches risk factor count.
if (opt["RiskFactorModels"].size() != correlations_table.num_columns()) {
throw hgps::core::HgpsException{
fmt::format("Risk factor count ({}) does not match correlation "
"matrix column count ({})",
opt["RiskFactorModels"].size(), correlations_table.num_columns())};
}

// Compute Cholesky decomposition of correlation matrix.
auto cholesky = Eigen::MatrixXd{Eigen::LLT<Eigen::MatrixXd>{correlations}.matrixL()};

return std::make_unique<hgps::StaticLinearModelDefinition>(std::move(names), std::move(models),
return std::make_unique<hgps::StaticLinearModelDefinition>(std::move(risk_factor_models),
std::move(cholesky));
}

Expand Down Expand Up @@ -253,14 +272,10 @@ load_ebhlm_risk_model_definition(const poco::json &opt) {
std::unique_ptr<hgps::KevinHallModelDefinition>
load_kevinhall_risk_model_definition(const poco::json &opt, const host::Configuration &config) {
MEASURE_FUNCTION();
std::unordered_map<hgps::core::Identifier, double> energy_equation;
std::unordered_map<hgps::core::Identifier, std::pair<double, double>> nutrient_ranges;
std::unordered_map<hgps::core::Identifier, std::map<hgps::core::Identifier, double>>
nutrient_equations;
std::unordered_map<hgps::core::Identifier, std::optional<double>> food_prices;
std::unordered_map<hgps::core::Gender, std::vector<double>> age_mean_height;

// Nutrient groups.
std::unordered_map<hgps::core::Identifier, double> energy_equation;
std::unordered_map<hgps::core::Identifier, std::pair<double, double>> nutrient_ranges;
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
for (const auto &nutrient : opt["Nutrients"]) {
auto nutrient_key = nutrient["Name"].get<hgps::core::Identifier>();
nutrient_ranges[nutrient_key] = nutrient["Range"].get<std::pair<double, double>>();
Expand All @@ -272,6 +287,9 @@ load_kevinhall_risk_model_definition(const poco::json &opt, const host::Configur
}

// Food groups.
std::unordered_map<hgps::core::Identifier, std::map<hgps::core::Identifier, double>>
nutrient_equations;
std::unordered_map<hgps::core::Identifier, std::optional<double>> food_prices;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using floating-point numbers for currency is a common gotcha, because of rounding errors. Do you think it would work to use an integer type instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I didn't do that as I didn't think much arithmetic would happen with them. They aren't used yet, and will presumably be just logged for financial impact of interventions. They can be changed to unsigned integral types for pence/cents/... I'll leave that for another time, if and as needed.

for (const auto &food : opt["Foods"]) {
auto food_key = food["Name"].get<hgps::core::Identifier>();
food_prices[food_key] = food["Price"].get<std::optional<double>>();
Expand All @@ -291,7 +309,28 @@ load_kevinhall_risk_model_definition(const poco::json &opt, const host::Configur
const auto food_data_file_info = host::get_file_info(opt["FoodsDataFile"], config.root_path);
const auto food_data_table = load_datatable_from_csv(food_data_file_info);

// Rural sector prevalence for age groups and sex.
std::map<hgps::core::Identifier, std::unordered_map<hgps::core::Gender, double>>
rural_prevalence;
for (const auto &age_group : opt["RuralPrevalence"]) {
auto age_group_name = age_group["Name"].get<hgps::core::Identifier>();
rural_prevalence[age_group_name] = {{hgps::core::Gender::female, age_group["Female"]},
{hgps::core::Gender::male, age_group["Male"]}};
}

// Income models for different income classifications.
std::vector<hgps::LinearModelParams> income_models;
for (const auto &factor : opt["IncomeModels"]) {
hgps::LinearModelParams model;
model.name = factor["Name"].get<hgps::core::Identifier>();
model.intercept = factor["Intercept"].get<double>();
model.coefficients =
factor["Coefficients"].get<std::unordered_map<hgps::core::Identifier, double>>();
income_models.emplace_back(std::move(model));
}

// Load M/F average heights for age.
std::unordered_map<hgps::core::Gender, std::vector<double>> age_mean_height;
const auto max_age = static_cast<size_t>(config.settings.age_range.upper());
auto male_height = opt["AgeMeanHeight"]["Male"].get<std::vector<double>>();
auto female_height = opt["AgeMeanHeight"]["Female"].get<std::vector<double>>();
Expand All @@ -306,7 +345,8 @@ load_kevinhall_risk_model_definition(const poco::json &opt, const host::Configur

return std::make_unique<hgps::KevinHallModelDefinition>(
std::move(energy_equation), std::move(nutrient_ranges), std::move(nutrient_equations),
std::move(food_prices), std::move(age_mean_height));
std::move(food_prices), std::move(rural_prevalence), std::move(income_models),
std::move(age_mean_height));
}

std::pair<hgps::RiskFactorModelType, std::unique_ptr<hgps::RiskFactorModelDefinition>>
Expand Down
13 changes: 13 additions & 0 deletions src/HealthGPS.Core/forward_type.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,18 @@ enum class DiseaseGroup : uint8_t {
cancer
};

/// @brief Enumerates sector types
enum class Sector : uint8_t {
/// @brief Unknown sector
unknown,

/// @brief Urban sector
urban,

/// @brief Rural sector
rural
};

/// @brief C++20 concept for numeric columns types
template <typename T>
concept Numerical = std::is_arithmetic_v<T>;
Expand All @@ -60,4 +72,5 @@ class DoubleDataTableColumn;
class IntegerDataTableColumn;

class DataTableColumnVisitor;

} // namespace hgps::core
5 changes: 1 addition & 4 deletions src/HealthGPS/dynamic_hierarchical_linear_model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,7 @@ RiskFactorModelType DynamicHierarchicalLinearModel::type() const noexcept {
std::string DynamicHierarchicalLinearModel::name() const noexcept { return "Dynamic"; }

void DynamicHierarchicalLinearModel::generate_risk_factors(
[[maybe_unused]] RuntimeContext &context) {
throw core::HgpsException(
"DynamicHierarchicalLinearModel::generate_risk_factors not yet implemented.");
}
[[maybe_unused]] RuntimeContext &context) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this model not need to generate risk factors or is this just something you haven't implemented yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't generate risk factors. That's done in e.g. StaticHierarchicalLinearModel. Alas, the confusion of using the generate (static) model and update (dynamic) model terminology when both models have generate and update methods. That's why I will do #231.


void DynamicHierarchicalLinearModel::update_risk_factors(RuntimeContext &context) {
auto age_key = core::Identifier{"age"};
Expand Down
1 change: 0 additions & 1 deletion src/HealthGPS/dynamic_hierarchical_linear_model.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@ class DynamicHierarchicalLinearModel final : public RiskFactorModel {

std::string name() const noexcept override;

/// @throws std::logic_error the dynamic model does not generate risk factors.
void generate_risk_factors(RuntimeContext &context) override;

void update_risk_factors(RuntimeContext &context) override;
Expand Down
Loading
Loading