Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sector and income #230

Merged
merged 29 commits into from
Oct 27, 2023
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
de623fd
Add SectorPrevalence to KevinHall.json config.
jamesturner246 Oct 20, 2023
d638cef
Load Rural Prevalence into model parser from JSON.
jamesturner246 Oct 20, 2023
585e936
Load Rural Prevalence into KH model.
jamesturner246 Oct 20, 2023
1922906
We want all our RF models' generate_ methods to run.
jamesturner246 Oct 20, 2023
3a6534e
Add stub initialise_sector method to KH model class.
jamesturner246 Oct 20, 2023
e5a312c
Add enum type for sector (region).
jamesturner246 Oct 20, 2023
87c4eac
Add 'unknown' sector enum type, and inititalise people to unknown sec…
jamesturner246 Oct 20, 2023
5f9a7fe
Finish initialise_sector code in KevinHall.
jamesturner246 Oct 20, 2023
eb35572
Clarify that we are using rural prevalence here.
jamesturner246 Oct 20, 2023
c404c0a
Don't std move trivial type
jamesturner246 Oct 20, 2023
f4d6638
model parser clang-tidy.
jamesturner246 Oct 23, 2023
b4d72ab
Add Income model parameters to KevinHall config JSON.
jamesturner246 Oct 23, 2023
d7f5b61
Load income models from JSON in mdoel parser.
jamesturner246 Oct 23, 2023
7c0846f
Load income models from mdoel parser into kevin hall model.
jamesturner246 Oct 23, 2023
1aea169
Add missing category_1 income model.
jamesturner246 Oct 24, 2023
3c6ad90
LinearModelParams should be a vector.
jamesturner246 Oct 24, 2023
c72c2bc
Add initialise_income to kevin hall model.
jamesturner246 Oct 24, 2023
61ab5b7
Fix read after std::move.
jamesturner246 Oct 25, 2023
06e553e
Call initialise_income in KH model.
jamesturner246 Oct 25, 2023
e373e53
Add sector update code to kevinhall.
jamesturner246 Oct 25, 2023
f5a22d2
Code to convert sector to a real, for use as a coefficient. Add error…
jamesturner246 Oct 25, 2023
2326ca9
Age group name consistency.
jamesturner246 Oct 25, 2023
3985efa
Add person method to check over 18.
jamesturner246 Oct 25, 2023
905223a
Add kevinhall code to update income.
jamesturner246 Oct 25, 2023
71926bd
Change weird if
jamesturner246 Oct 26, 2023
d32fd44
nutrient_ranges is now a Double interval, interval now has its own cl…
jamesturner246 Oct 26, 2023
a44f19c
KH: rural_prevalence should be an unordered_map.
jamesturner246 Oct 26, 2023
0ab8ef6
KH: use vector reserve, rather than letting it zero init.
jamesturner246 Oct 26, 2023
1ebaa1a
KH: push_back, don't assign, into reserved vector.
jamesturner246 Oct 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions example_new/KevinHall.json
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,43 @@
"Description": "string"
}
},
"RuralPrevalence": [
{
"AgeRange": [0, 17],
"Female": 0.65,
"Male": 0.65
},
{
"AgeRange": [18, 100],
"Female": 0.6,
"Male": 0.6
}
],
"IncomeModels": [
{
"Name": "Category_1",
"Intercept": 1.0,
"Coefficients": {}
},
{
"Name": "Category_2",
"Intercept": 0.75498278113169,
"Coefficients": {
"Sex": 0.0204338261883629,
"Over18": 0.0486699373325097,
"Sector": -0.706477682886734
}
},
{
"Name": "Category_3",
"Intercept": 1.48856946873517,
"Coefficients": {
"Sex": 0.000641255498596025,
"Over18": 0.206622749570132,
"Sector": -1.71313287940798
}
}
],
"AgeMeanHeight": {
"Male": [
0.498, 0.757, 0.868, 0.952, 1.023, 1.092, 1.155, 1.219, 1.28, 1.333,
Expand Down
88 changes: 64 additions & 24 deletions src/HealthGPS.Console/model_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -129,31 +129,50 @@ std::unique_ptr<hgps::StaticLinearModelDefinition>
load_staticlinear_risk_model_definition(const poco::json &opt, const host::Configuration &config) {
MEASURE_FUNCTION();

// Risk factor linear models.
hgps::LinearModelParams models;
for (const auto &factor : opt["RiskFactorModels"]) {
auto factor_key = factor["Name"].get<hgps::core::Identifier>();
models.intercepts[factor_key] = factor["Intercept"].get<double>();
models.coefficients[factor_key] =
factor["Coefficients"].get<std::unordered_map<hgps::core::Identifier, double>>();
}

// Risk factor names and correlation matrix.
std::vector<hgps::core::Identifier> names;
// Risk factor linear models and correlation matrix.
std::vector<hgps::LinearModelParams> risk_factor_models;
const auto correlations_file_info =
host::get_file_info(opt["RiskFactorCorrelationFile"], config.root_path);
const auto correlations_table = load_datatable_from_csv(correlations_file_info);
Eigen::MatrixXd correlations{correlations_table.num_rows(), correlations_table.num_columns()};
for (size_t col = 0; col < correlations_table.num_columns(); col++) {
names.emplace_back(correlations_table.column(col).name());
for (size_t row = 0; row < correlations_table.num_rows(); row++) {
correlations(row, col) =
std::any_cast<double>(correlations_table.column(col).value(row));

for (size_t i = 0; i < opt["RiskFactorModels"].size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debatable as to whether this would be an improvement, but FYI you can zip ranges together in C++20, as you would with Python: https://en.cppreference.com/w/cpp/ranges/zip_view

In this case you could zip opt["RiskFactorModels"] and correlations_table (which is also a range because it has cbegin and cend methods) and then you avoid defining i.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be a C++23 feature :(

// Risk factor model.
const auto &factor = opt["RiskFactorModels"][i];
hgps::LinearModelParams model;
model.name = factor["Name"].get<hgps::core::Identifier>();
model.intercept = factor["Intercept"].get<double>();
model.coefficients =
factor["Coefficients"].get<std::unordered_map<hgps::core::Identifier, double>>();
risk_factor_models.emplace_back(std::move(model));

// Correlation matrix column.
for (size_t j = 0; j < correlations_table.num_rows(); j++) {
correlations(i, j) = std::any_cast<double>(correlations_table.column(i).value(j));
}

// Check correlation matrix column name matches risk factor name.
auto column_name = hgps::core::Identifier{correlations_table.column(i).name()};
if (model.name != column_name) {
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
throw hgps::core::HgpsException{
fmt::format("Risk factor {} name ({}) does not match correlation matrix "
"column {} name ({})",
i, model.name.to_string(), i, column_name.to_string())};
}
}

// Check correlation matrix column count matches risk factor count.
if (opt["RiskFactorModels"].size() != correlations_table.num_columns()) {
throw hgps::core::HgpsException{
fmt::format("Risk factor count ({}) does not match correlation "
"matrix column count ({})",
opt["RiskFactorModels"].size(), correlations_table.num_columns())};
}

// Compute Cholesky decomposition of correlation matrix.
auto cholesky = Eigen::MatrixXd{Eigen::LLT<Eigen::MatrixXd>{correlations}.matrixL()};

return std::make_unique<hgps::StaticLinearModelDefinition>(std::move(names), std::move(models),
return std::make_unique<hgps::StaticLinearModelDefinition>(std::move(risk_factor_models),
std::move(cholesky));
}

Expand Down Expand Up @@ -253,14 +272,10 @@ load_ebhlm_risk_model_definition(const poco::json &opt) {
std::unique_ptr<hgps::KevinHallModelDefinition>
load_kevinhall_risk_model_definition(const poco::json &opt, const host::Configuration &config) {
MEASURE_FUNCTION();
std::unordered_map<hgps::core::Identifier, double> energy_equation;
std::unordered_map<hgps::core::Identifier, std::pair<double, double>> nutrient_ranges;
std::unordered_map<hgps::core::Identifier, std::map<hgps::core::Identifier, double>>
nutrient_equations;
std::unordered_map<hgps::core::Identifier, std::optional<double>> food_prices;
std::unordered_map<hgps::core::Gender, std::vector<double>> age_mean_height;

// Nutrient groups.
std::unordered_map<hgps::core::Identifier, double> energy_equation;
std::unordered_map<hgps::core::Identifier, std::pair<double, double>> nutrient_ranges;
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
for (const auto &nutrient : opt["Nutrients"]) {
auto nutrient_key = nutrient["Name"].get<hgps::core::Identifier>();
nutrient_ranges[nutrient_key] = nutrient["Range"].get<std::pair<double, double>>();
Expand All @@ -272,6 +287,9 @@ load_kevinhall_risk_model_definition(const poco::json &opt, const host::Configur
}

// Food groups.
std::unordered_map<hgps::core::Identifier, std::map<hgps::core::Identifier, double>>
nutrient_equations;
std::unordered_map<hgps::core::Identifier, std::optional<double>> food_prices;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using floating-point numbers for currency is a common gotcha, because of rounding errors. Do you think it would work to use an integer type instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I didn't do that as I didn't think much arithmetic would happen with them. They aren't used yet, and will presumably be just logged for financial impact of interventions. They can be changed to unsigned integral types for pence/cents/... I'll leave that for another time, if and as needed.

for (const auto &food : opt["Foods"]) {
auto food_key = food["Name"].get<hgps::core::Identifier>();
food_prices[food_key] = food["Price"].get<std::optional<double>>();
Expand All @@ -291,7 +309,28 @@ load_kevinhall_risk_model_definition(const poco::json &opt, const host::Configur
const auto food_data_file_info = host::get_file_info(opt["FoodsDataFile"], config.root_path);
const auto food_data_table = load_datatable_from_csv(food_data_file_info);

// Rural sector prevalence for age groups and sex.
std::map<hgps::core::IntegerInterval, std::unordered_map<hgps::core::Gender, double>>
rural_prevalence;
for (const auto &age_group : opt["RuralPrevalence"]) {
auto age_group_range = age_group["AgeRange"].get<hgps::core::IntegerInterval>();
rural_prevalence[age_group_range] = {{hgps::core::Gender::female, age_group["Female"]},
{hgps::core::Gender::male, age_group["Male"]}};
}

// Income models for different income classifications.
std::vector<hgps::LinearModelParams> income_models;
for (const auto &factor : opt["IncomeModels"]) {
hgps::LinearModelParams model;
model.name = factor["Name"].get<hgps::core::Identifier>();
model.intercept = factor["Intercept"].get<double>();
model.coefficients =
factor["Coefficients"].get<std::unordered_map<hgps::core::Identifier, double>>();
income_models.emplace_back(std::move(model));
}

// Load M/F average heights for age.
std::unordered_map<hgps::core::Gender, std::vector<double>> age_mean_height;
const auto max_age = static_cast<size_t>(config.settings.age_range.upper());
auto male_height = opt["AgeMeanHeight"]["Male"].get<std::vector<double>>();
auto female_height = opt["AgeMeanHeight"]["Female"].get<std::vector<double>>();
Expand All @@ -306,7 +345,8 @@ load_kevinhall_risk_model_definition(const poco::json &opt, const host::Configur

return std::make_unique<hgps::KevinHallModelDefinition>(
std::move(energy_equation), std::move(nutrient_ranges), std::move(nutrient_equations),
std::move(food_prices), std::move(age_mean_height));
std::move(food_prices), std::move(rural_prevalence), std::move(income_models),
std::move(age_mean_height));
}

std::pair<hgps::RiskFactorModelType, std::unique_ptr<hgps::RiskFactorModelDefinition>>
Expand Down
13 changes: 13 additions & 0 deletions src/HealthGPS.Core/forward_type.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,18 @@ enum class DiseaseGroup : uint8_t {
cancer
};

/// @brief Enumerates sector types
enum class Sector : uint8_t {
/// @brief Unknown sector
unknown,

/// @brief Urban sector
urban,

/// @brief Rural sector
rural
};

/// @brief C++20 concept for numeric columns types
template <typename T>
concept Numerical = std::is_arithmetic_v<T>;
Expand All @@ -60,4 +72,5 @@ class DoubleDataTableColumn;
class IntegerDataTableColumn;

class DataTableColumnVisitor;

} // namespace hgps::core
5 changes: 1 addition & 4 deletions src/HealthGPS/dynamic_hierarchical_linear_model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,7 @@ RiskFactorModelType DynamicHierarchicalLinearModel::type() const noexcept {
std::string DynamicHierarchicalLinearModel::name() const noexcept { return "Dynamic"; }

void DynamicHierarchicalLinearModel::generate_risk_factors(
[[maybe_unused]] RuntimeContext &context) {
throw core::HgpsException(
"DynamicHierarchicalLinearModel::generate_risk_factors not yet implemented.");
}
[[maybe_unused]] RuntimeContext &context) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this model not need to generate risk factors or is this just something you haven't implemented yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't generate risk factors. That's done in e.g. StaticHierarchicalLinearModel. Alas, the confusion of using the generate (static) model and update (dynamic) model terminology when both models have generate and update methods. That's why I will do #231.


void DynamicHierarchicalLinearModel::update_risk_factors(RuntimeContext &context) {
auto age_key = core::Identifier{"age"};
Expand Down
1 change: 0 additions & 1 deletion src/HealthGPS/dynamic_hierarchical_linear_model.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@ class DynamicHierarchicalLinearModel final : public RiskFactorModel {

std::string name() const noexcept override;

/// @throws std::logic_error the dynamic model does not generate risk factors.
void generate_risk_factors(RuntimeContext &context) override;

void update_risk_factors(RuntimeContext &context) override;
Expand Down
101 changes: 96 additions & 5 deletions src/HealthGPS/kevin_hall_model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,13 @@ KevinHallModel::KevinHallModel(
const std::unordered_map<core::Identifier, std::map<core::Identifier, double>>
&nutrient_equations,
const std::unordered_map<core::Identifier, std::optional<double>> &food_prices,
const std::map<hgps::core::IntegerInterval, std::unordered_map<hgps::core::Gender, double>>
&rural_prevalence,
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
const std::vector<LinearModelParams> &income_models,
const std::unordered_map<core::Gender, std::vector<double>> &age_mean_height)
: energy_equation_{energy_equation}, nutrient_ranges_{nutrient_ranges},
nutrient_equations_{nutrient_equations}, food_prices_{food_prices},
rural_prevalence_{rural_prevalence}, income_models_{income_models},
age_mean_height_{age_mean_height} {

if (energy_equation_.empty()) {
Expand All @@ -50,6 +54,12 @@ KevinHallModel::KevinHallModel(
if (food_prices_.empty()) {
throw core::HgpsException("Food price mapping is empty");
}
if (rural_prevalence_.empty()) {
throw core::HgpsException("Rural prevalence mapping is empty");
}
if (income_models_.empty()) {
throw core::HgpsException("Income models list is empty");
}
if (age_mean_height_.empty()) {
throw core::HgpsException("Age mean height mapping is empty");
}
Expand All @@ -59,19 +69,34 @@ RiskFactorModelType KevinHallModel::type() const noexcept { return RiskFactorMod

std::string KevinHallModel::name() const noexcept { return "Dynamic"; }

void KevinHallModel::generate_risk_factors([[maybe_unused]] RuntimeContext &context) {
throw core::HgpsException("KevinHallModel::generate_risk_factors not yet implemented.");
void KevinHallModel::generate_risk_factors(RuntimeContext &context) {

// Initialise sector for everyone.
for (auto &person : context.population()) {
initialise_sector(context, person);
}
}

void KevinHallModel::update_risk_factors(RuntimeContext &context) {
hgps::Population &population = context.population();
double mean_sim_body_weight = 0.0;
double mean_adjustment_coefficient = 0.0;

// Initialise sector for newborns.
for (auto &person : population) {

// Do not initialise non-newborns.
if (person.age > 0) {
continue;
}

initialise_sector(context, person);
}

// TODO: Compute target body weight.
const float target_BW = 100.0;

// Trial run.
double mean_sim_body_weight = 0.0;
double mean_adjustment_coefficient = 0.0;
for (auto &person : population) {
// Ignore if inactive.
if (!person.is_active()) {
Expand Down Expand Up @@ -114,6 +139,61 @@ void KevinHallModel::update_risk_factors(RuntimeContext &context) {
}
}

void KevinHallModel::initialise_sector(RuntimeContext &context, Person &person) const {

// Get prevalence for age and sex (default is zero).
double rural_prevalence = 0.0;
for (const auto &[age_range, prevalence_by_sex] : rural_prevalence_) {
if (age_range.contains(person.age)) {
rural_prevalence = prevalence_by_sex.at(person.gender);
break;
}
}
alexdewar marked this conversation as resolved.
Show resolved Hide resolved

// Sample the person's sector.
double rand = context.random().next_double();
auto sector = rand < rural_prevalence ? core::Sector::rural : core::Sector::urban;
person.sector = sector;
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
}

void KevinHallModel::initialise_income(RuntimeContext &context, Person &person) const {

// Compute logits for each income category.
auto logits = std::vector<double>(income_models_.size());
for (size_t i = 0; i < income_models_.size(); i++) {
logits[i] = income_models_[i].intercept;
for (const auto &[factor_name, coefficient] : income_models_[i].coefficients) {
logits[i] += coefficient * person.get_risk_factor_value(factor_name);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative implementation using ranges (you'll also need #include <ranges>):

Suggested change
auto logits = std::vector<double>(income_models_.size());
for (size_t i = 0; i < income_models_.size(); i++) {
logits[i] = income_models_[i].intercept;
for (const auto &[factor_name, coefficient] : income_models_[i].coefficients) {
logits[i] += coefficient * person.get_risk_factor_value(factor_name);
}
}
auto logits = std::vector<double>{};
logits.reserve(income_models_.size());
std::ranges::transform(income_models_, std::back_inserter(logits),
[&person](const auto &model) {
double logit = model.intercept;
for (const auto &[factor_name, coefficient] : model.coefficients) {
logit += coefficient * person.get_risk_factor_value(factor_name);
}
return logit;
});


// Compute softmax probabilities for each income category.
auto e_logits = std::vector<double>(income_models_.size());
double e_logits_sum = 0.0;
for (size_t i = 0; i < income_models_.size(); i++) {
e_logits[i] = exp(logits[i]);
e_logits_sum += e_logits[i];
}

// Compute income category probabilities.
auto probabilities = std::vector<double>(income_models_.size());
for (size_t i = 0; i < income_models_.size(); i++) {
probabilities[i] = e_logits[i] / e_logits_sum;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could also use std::ranges::transform, though the way you've done it is fine too!

One thing I would say is that generally it's better to append to std::vectors iteratively rather than creating them with a particular size upfront (it means you can skip the zeroing of the memory). You can use the reserve() method to hint at how many elements it will have eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough with reserve, but I think I prefer the readability of for loops for such simple ops.


// Compute income category.
double rand = context.random().next_double();
for (size_t i = 0; i < probabilities.size(); i++) {
if (rand < probabilities[i]) {
person.income = income_models_[i].name;
return;
}
rand -= probabilities[i];
}

throw core::HgpsException("Logic Error: failed to initialise income category");
}

SimulatePersonState KevinHallModel::simulate_person(Person &person, double shift) const {
// Initial simulated person state.
const double H_0 = person.get_risk_factor_value(H_key);
Expand Down Expand Up @@ -280,9 +360,13 @@ KevinHallModelDefinition::KevinHallModelDefinition(
std::unordered_map<core::Identifier, std::pair<double, double>> nutrient_ranges,
std::unordered_map<core::Identifier, std::map<core::Identifier, double>> nutrient_equations,
std::unordered_map<core::Identifier, std::optional<double>> food_prices,
std::map<hgps::core::IntegerInterval, std::unordered_map<hgps::core::Gender, double>>
rural_prevalence,
std::vector<LinearModelParams> income_models,
std::unordered_map<core::Gender, std::vector<double>> age_mean_height)
: energy_equation_{std::move(energy_equation)}, nutrient_ranges_{std::move(nutrient_ranges)},
nutrient_equations_{std::move(nutrient_equations)}, food_prices_{std::move(food_prices)},
rural_prevalence_{std::move(rural_prevalence)}, income_models_{std::move(income_models)},
age_mean_height_{std::move(age_mean_height)} {

if (energy_equation_.empty()) {
Expand All @@ -297,14 +381,21 @@ KevinHallModelDefinition::KevinHallModelDefinition(
if (food_prices_.empty()) {
throw core::HgpsException("Food prices mapping is empty");
}
if (rural_prevalence_.empty()) {
throw core::HgpsException("Rural prevalence mapping is empty");
}
if (income_models_.empty()) {
throw core::HgpsException("Income models list is empty");
}
if (age_mean_height_.empty()) {
throw core::HgpsException("Age mean height mapping is empty");
}
}

std::unique_ptr<RiskFactorModel> KevinHallModelDefinition::create_model() const {
return std::make_unique<KevinHallModel>(energy_equation_, nutrient_ranges_, nutrient_equations_,
food_prices_, age_mean_height_);
food_prices_, rural_prevalence_, income_models_,
age_mean_height_);
}

} // namespace hgps
Expand Down
Loading
Loading