Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Risk factor refactor #195

Merged
merged 7 commits into from
Aug 7, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 0 additions & 57 deletions example/France.EBHLM.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is autogenerated from a folder with 50+ files and the country data file, not sure we gain much by hand editing this, perhaps changing the script that generate this file would be more beneficial. Alternatively, you could remove this section from the config file and read from here instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the respective healthgps-tools script should also be updated. I'm considering a Python rewrite of the scripts, to sidestep the Windows R binding dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @israel-vieira.

Though we do want the config files in this repo to be up to date and working with the current version of the code for testing purposes. Whether we edit it directly or just the script for generating it isn't so important I think, as long as it's correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made an issue for later.

Original file line number Diff line number Diff line change
Expand Up @@ -7,63 +7,6 @@
"Alpha3": "FRA"
},
"BoundaryPercentage": 0.05,
"RiskFactors": [
{
"Name": "Gender",
"Level": 0,
"Range": [0, 1]
},
{
"Name": "Age",
"Level": 0,
"Range": [1, 87]
},
{
"Name": "Age2",
"Level": 0,
"Range": [1, 7569]
},
{
"Name": "Age3",
"Level": 0,
"Range": [1, 658503]
},
{
"Name": "SES",
"Level": 0,
"Range": [-2.316299, 2.296689]
},
{
"Name": "Sodium",
"Level": 1,
"Range": [1.127384, 8.656519]
},
{
"Name": "Protein",
"Level": 1,
"Range": [43.50682, 238.4145]
},
{
"Name": "Fat",
"Level": 1,
"Range": [45.04756, 382.664098658922]
},
{
"Name": "PA",
"Level": 2,
"Range": [22.22314, 9765.512]
},
{
"Name": "Energy",
"Level": 2,
"Range": [1326.14051277588, 7522.496]
},
{
"Name": "BMI",
"Level": 3,
"Range": [13.88, 39.48983]
}
],
"Variables": [
{
"Name": "dPA",
Expand Down
8 changes: 1 addition & 7 deletions src/HealthGPS.Console/configuration.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of diseases can be long, 30+. The current validation will list all the invalid diseases key, if any, before exiting. The exception alternative will break at the first invalid key, the use will need to keep on trying until all are correct, in the end, both methods are accomplishing the required validation, it is just a matter of style.

Mapping the config with the back-end in a case-sensitive world is never user friend, the identifier type tries to minimize the pain by making all keys lowercase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case-sensitivity: fair point. In which case, the task then becomes removing the parts of the codebase that are case-sensitive. I've spotted a few.

Diseases: agree, a style choice. My concern in these cases has always been that by warning and continuing for dodgy input, that it makes the problem slightly less obvious to diagnose when it fails later on in the program, than if it were to immediately fail on loading. Perhaps this can be replaced with a function which reads all diseases, detects missing ones with printed debug info, before throwing an exception for any missing inputs, as opposed to a case-by-case exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @israel-vieira on this - collecting all that is invalid and providing a full list to the user (within meaningful indication of what is wrong) results on a better user experience than having to try again and again until all issues are sorted. We have another project where this is the approach and is indeed much better.

All in all, having a strong validation and input pre-processing steps where things can fail early and inconsistencies fixed (casing, proxis, etc.) is a good thing that will result in simpler, easier to understand (and diagnose) code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't consider users input validation as an exceptional circumstance, the validation should focus on helping the users to fix the wrong or missing information provided, perhaps the system should show valid options before exiting gracefully.

In general, users don't read or understand stack traces with a message hidden inside a large block of text, this is kind of information is more relevant for developers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two issues here:

  1. Whether to "fast fail" when loading the config or provide a list of failures at the end
  2. Whether to implement this using exceptions or some other mechanism (e.g. std::optional)

My view is:

  1. Yes, if the validation fails, we should continue attempting to load the file and provide a list of errors at the end, rather than aborting immediately (for the most part)
  2. @jamesturner246's refactor of get_country and get_disease_info to use exceptions is sensible and makes things clearer

The problem with at least some of the config loading code atm is that if validation fails, it doesn't abort immediately, but it also doesn't abort after the validation step either, so while you may get a warning on the console, the program will blithely plough on using default values, which is not good (see #161).

I'm a big believer in using exceptions to propagate errors in C++; it can make code much cleaner and safer. I'll give an example of how you might do "lazy failure" when loading disease info using exceptions:

std::vector<core::DiseaseInfo> get_diseases_info(core::Datastore &data_api, Configuration &config) {
    const auto diseases = data_api.get_diseases();
    fmt::print("\nThere are {} diseases in storage, {} selected.\n", diseases.size(),
               config.diseases.size());

    std::vector<core::DiseaseInfo> result;
    std::set<std::string> missing_diseases;
    for (const auto &code : config.diseases) {
        try {
            result.emplace_back(data_api.get_disease_info(code));
        } catch (const std::invalid_argument &) {
            missing_diseases.emplace(code);
        }
    }

    if (!missing_diseases.empty()) {
        std::cerr << "ERROR: The following diseases were not found:\n";
        for (const auto &disease : missing_diseases) {
            std::cerr << "\t" << disease << "\n";
        }

        throw std::runtime_error{"Some invalid diseases were given"};
    }

    return result;
}

(There might be things we could tweak -- e.g. I haven't used the fmt library -- but this is just to give an idea.)

This code assumes that the caller will appropriately handle any thrown std::runtime_error, which it absolutely should do (we might want to use a bespoke error type rather than std::runtime_error though). So it means you a) tell the user what errors have occurred and b) propagate the errors up the call stack. Ultimately, the main config loading function should be wrapped in a try...catch and if an error occurs, it just tells the user and exits the program (the user will already have been informed of the specific issues with the config loading by this point).

Note that this code is also safer, because a std::runtime_error will also be thrown if the "disease" value is missing, meaning that the caller knows that something has gone wrong, as opposed to the current state of affairs where just a warning message is displayed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this ^. The user gets helpful human readable messages noting which keys were missing, and execution stops before there's a chance of accidental continuation on default values. I'm happy to merge, and happy to leave it with you if you already have the idea in mind.

Original file line number Diff line number Diff line change
Expand Up @@ -293,13 +293,7 @@ std::vector<core::DiseaseInfo> get_diseases_info(core::Datastore &data_api, Conf
config.diseases.size());

for (const auto &code : config.diseases) {
auto code_key = core::Identifier{code};
auto item = data_api.get_disease_info(code_key);
if (item.has_value()) {
result.emplace_back(item.value());
} else {
fmt::print(fg(fmt::color::salmon), "Unknown disease code: {}.\n", code);
}
result.emplace_back(data_api.get_disease_info(core::Identifier{code}));
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
}

return result;
Expand Down
1 change: 1 addition & 0 deletions src/HealthGPS.Console/csvparser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ create_fields_index_mapping(const std::vector<std::string> &column_names,

return mapping;
}

} // namespace detail

bool load_datatable_from_csv(hc::DataTable &out_table, std::string full_filename,
Expand Down
16 changes: 4 additions & 12 deletions src/HealthGPS.Console/program.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,17 +66,9 @@ int main(int argc, char *argv[]) { // NOLINT(bugprone-exception-escape)
auto factory = get_default_simulation_module_factory(data_repository);

// Validate the configuration's target country for the simulation
auto countries = data_api.get_countries();
fmt::print("\nThere are {} countries in storage.\n", countries.size());
auto target = data_api.get_country(config.settings.country);
if (target.has_value()) {
fmt::print("Target country: {} - {}, population: {:0.3g}%.\n", target.value().code,
target.value().name, config.settings.size_fraction * 100.0f);
} else {
fmt::print(fg(fmt::color::red), "\nTarget country: {} not found.\n",
config.settings.country);
return exit_application(EXIT_FAILURE);
}
auto country = data_api.get_country(config.settings.country);
fmt::print("Target country: {} - {}, population: {:0.3g}%.\n", country.code, country.name,
config.settings.size_fraction * 100.0f);

// Validate the configuration diseases list, must exists in back-end data store
auto diseases = get_diseases_info(data_api, config);
Expand All @@ -95,7 +87,7 @@ int main(int argc, char *argv[]) { // NOLINT(bugprone-exception-escape)
std::cout << input_table;

// Create complete model input from configuration
auto model_input = create_model_input(input_table, target.value(), config, diseases);
auto model_input = create_model_input(input_table, country, config, diseases);

// Create event bus and event monitor with a results file writer
auto event_bus = DefaultEventBus();
Expand Down
9 changes: 4 additions & 5 deletions src/HealthGPS.Core/datastore.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

#include "interval.h"
#include "poco.h"
#include <optional>
#include <vector>

namespace hgps::core {
Expand All @@ -21,8 +20,8 @@ class Datastore {

/// @brief Gets a single country by the alpha code
/// @param alpha The country alpha 2 or 3 format code to search
/// @return The country's definition, if found, otherwise empty
virtual std::optional<Country> get_country(std::string alpha) const = 0;
/// @return The country's definition
virtual Country get_country(std::string alpha) const = 0;

/// @brief Gets the population growth trend for a country filtered by time
/// @param country The target country definition
Expand All @@ -46,8 +45,8 @@ class Datastore {

/// @brief Gets a single disease information by identifier
/// @param code The target disease identifier
/// @return The disease information, if found, otherwise empty
virtual std::optional<DiseaseInfo> get_disease_info(Identifier code) const = 0;
/// @return The disease information
virtual DiseaseInfo get_disease_info(Identifier code) const = 0;

/// @brief Gets a disease full definition by identifier for a country
/// @param info The target disease information
Expand Down
25 changes: 13 additions & 12 deletions src/HealthGPS.Datastore/datamanager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,18 +58,19 @@ std::vector<Country> DataManager::get_countries() const {
return results;
}

std::optional<Country> DataManager::get_country(std::string alpha) const {
auto v = get_countries();
auto is_target = [&alpha](const hgps::core::Country &obj) {
return core::case_insensitive::equals(obj.alpha2, alpha) ||
core::case_insensitive::equals(obj.alpha3, alpha);
Country DataManager::get_country(std::string alpha) const {
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
auto c = get_countries();
auto is_target = [&alpha](const hgps::core::Country &c) {
return core::case_insensitive::equals(c.alpha2, alpha) ||
core::case_insensitive::equals(c.alpha3, alpha);
};

if (auto it = std::find_if(v.begin(), v.end(), is_target); it != v.end()) {
return (*it);
auto country = std::find_if(c.begin(), c.end(), is_target);
if (country != c.end()) {
return *country;
}

return std::nullopt;
throw std::invalid_argument(fmt::format("Target country: '{}' not found.", alpha));
}

std::vector<PopulationItem> DataManager::get_population(Country country) const {
Expand Down Expand Up @@ -200,7 +201,7 @@ std::vector<DiseaseInfo> DataManager::get_diseases() const {
return result;
}

std::optional<DiseaseInfo> DataManager::get_disease_info(core::Identifier code) const {
DiseaseInfo DataManager::get_disease_info(core::Identifier code) const {
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
if (index_.contains("diseases")) {
auto &registry = index_["diseases"]["registry"];
auto disease_code_str = code.to_string();
Expand All @@ -221,11 +222,11 @@ std::optional<DiseaseInfo> DataManager::get_disease_info(core::Identifier code)
return info;
}
}
} else {
notify_warning("index has no 'diseases' entry.");

throw std::invalid_argument(fmt::format("Disease code: '{}' not found.", code.to_string()));
}

return std::optional<DiseaseInfo>();
throw std::runtime_error("Index has no 'diseases' entry.");
}

DiseaseEntity DataManager::get_disease(DiseaseInfo info, Country country) const {
Expand Down
7 changes: 3 additions & 4 deletions src/HealthGPS.Datastore/datamanager.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,14 @@ class DataManager : public Datastore {
/// @brief Initialises a new instance of the hgps::data::DataManager class.
/// @param root_directory The store root folder containing the index.json file.
/// @param verbosity The terminal logging verbosity mode to use.
/// @throws std::invalid_argument if the given folder does exists or contains the index.json
/// file.
/// @throws std::invalid_argument if the root directory or index.json is missing.
/// @throws std::runtime_error for invalid or unsupported index.json file schema version.
explicit DataManager(const std::filesystem::path root_directory,
VerboseMode verbosity = VerboseMode::none);

std::vector<Country> get_countries() const override;

std::optional<Country> get_country(std::string alpha) const override;
Country get_country(std::string alpha) const override;

std::vector<PopulationItem> get_population(Country country) const;

Expand All @@ -52,7 +51,7 @@ class DataManager : public Datastore {

std::vector<DiseaseInfo> get_diseases() const override;

std::optional<DiseaseInfo> get_disease_info(core::Identifier code) const override;
DiseaseInfo get_disease_info(core::Identifier code) const override;

DiseaseEntity get_disease(DiseaseInfo code, Country country) const override;

Expand Down
65 changes: 28 additions & 37 deletions src/HealthGPS.Tests/Datastore.Test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,7 @@ class DatastoreTest : public ::testing::Test {
DatastoreTest() : manager{test_datastore_path} {}

hgps::data::DataManager manager;

// We don't need to check because this is just for testing
hgps::core::Country uk =
manager.get_country("GB").value(); // NOLINT(bugprone-unchecked-optional-access)
hgps::core::Country uk = manager.get_country("GB");
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
};

TEST_F(DatastoreTest, CreateDataManager) {
Expand All @@ -23,24 +20,20 @@ TEST_F(DatastoreTest, CreateDataManager) {
}

TEST_F(DatastoreTest, CreateDataManagerFailWithWrongPath) {
using namespace hgps::data;
ASSERT_THROW(DataManager{"C:\\x\\y"}, std::invalid_argument);
ASSERT_THROW(DataManager{"C:/x/y"}, std::invalid_argument);
ASSERT_THROW(DataManager{"/home/x/y/z"}, std::invalid_argument);
ASSERT_THROW(hgps::data::DataManager{"C:\\x\\y"}, std::invalid_argument);
ASSERT_THROW(hgps::data::DataManager{"C:/x/y"}, std::invalid_argument);
ASSERT_THROW(hgps::data::DataManager{"/home/x/y/z"}, std::invalid_argument);
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
}

TEST_F(DatastoreTest, CountryIsCaseInsensitive) {
auto countries = manager.get_countries();
auto gb2_lower = manager.get_country("gb");
auto gb2_upper = manager.get_country("GB");
auto gb3_lower = manager.get_country("gbr");
auto gb3_upper = manager.get_country("GBR");
TEST_F(DatastoreTest, CountryMissingThrowsException) {
ASSERT_THROW(manager.get_country("FAKE"), std::invalid_argument);
}

ASSERT_GT(countries.size(), 0);
ASSERT_TRUE(gb2_lower.has_value());
ASSERT_TRUE(gb2_upper.has_value());
ASSERT_TRUE(gb3_lower.has_value());
ASSERT_TRUE(gb3_upper.has_value());
TEST_F(DatastoreTest, CountryIsCaseInsensitive) {
ASSERT_NO_THROW(manager.get_country("gb"));
ASSERT_NO_THROW(manager.get_country("GB"));
ASSERT_NO_THROW(manager.get_country("gbr"));
ASSERT_NO_THROW(manager.get_country("GBR"));
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
}

TEST_F(DatastoreTest, CountryPopulation) {
Expand Down Expand Up @@ -130,15 +123,23 @@ TEST_F(DatastoreTest, CountryMortality) {
}
}

TEST_F(DatastoreTest, RetrieveDeseasesInfo) {
TEST_F(DatastoreTest, GetDiseases) {
auto diseases = manager.get_diseases();
ASSERT_GT(diseases.size(), 0);
}

TEST_F(DatastoreTest, GetDiseaseInfoMatchesGetDisases) {
auto diseases = manager.get_diseases();
for (auto &item : diseases) {
EXPECT_NO_THROW(manager.get_disease_info(item.code));
auto info = manager.get_disease_info(item.code);
EXPECT_TRUE(info.has_value());
EXPECT_EQ(item.code, info.value().code); // NOLINT(bugprone-unchecked-optional-access)
EXPECT_EQ(item.code, info.code);
}
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
}

ASSERT_GT(diseases.size(), 0);
TEST_F(DatastoreTest, GetDiseaseInfoMissingThrowsException) {
hgps::core::Identifier fake_code{"FAKE"};
EXPECT_THROW(manager.get_disease_info(fake_code), std::invalid_argument);
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
}

TEST_F(DatastoreTest, RetrieveDeseasesTypeInInfo) {
Expand All @@ -156,12 +157,6 @@ TEST_F(DatastoreTest, RetrieveDeseasesTypeInInfo) {
ASSERT_GT(cancer_count, 0);
}

TEST_F(DatastoreTest, RetrieveDeseasesInfoHasNoValue) {
using namespace hgps::core;
auto info = manager.get_disease_info(Identifier{"ghost369"});
EXPECT_FALSE(info.has_value());
}

TEST_F(DatastoreTest, RetrieveDeseaseDefinition) {
auto diseases = manager.get_diseases();
for (auto &item : diseases) {
Expand Down Expand Up @@ -194,10 +189,8 @@ TEST_F(DatastoreTest, RetrieveDeseaseDefinitionIsEmpty) {
TEST_F(DatastoreTest, DiseaseRelativeRiskToDisease) {
using namespace hgps::core;

// NOLINTBEGIN(bugprone-unchecked-optional-access)
auto asthma = manager.get_disease_info(Identifier{"asthma"}).value();
auto diabetes = manager.get_disease_info(Identifier{"diabetes"}).value();
// NOLINTEND(bugprone-unchecked-optional-access)
auto asthma = manager.get_disease_info(Identifier{"asthma"});
auto diabetes = manager.get_disease_info(Identifier{"diabetes"});
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved

auto table_self = manager.get_relative_risk_to_disease(diabetes, diabetes);
auto table_other = manager.get_relative_risk_to_disease(diabetes, asthma);
Expand All @@ -217,8 +210,7 @@ TEST_F(DatastoreTest, DiseaseRelativeRiskToDisease) {
TEST_F(DatastoreTest, DefaultDiseaseRelativeRiskToDisease) {
using namespace hgps::core;

// NOLINTNEXTLINE(bugprone-unchecked-optional-access)
auto diabetes = manager.get_disease_info(Identifier{"diabetes"}).value();
auto diabetes = manager.get_disease_info(Identifier{"diabetes"});
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved
auto info = DiseaseInfo{.group = DiseaseGroup::other,
.code = Identifier{"ghost369"},
.name = "Look at the flowers."};
Expand All @@ -235,8 +227,7 @@ TEST_F(DatastoreTest, DiseaseRelativeRiskToRiskFactor) {
using namespace hgps::core;

auto risk_factor = Identifier{"bmi"};
// NOLINTNEXTLINE(bugprone-unchecked-optional-access)
auto diabetes = manager.get_disease_info(Identifier{"diabetes"}).value();
auto diabetes = manager.get_disease_info(Identifier{"diabetes"});
jamesturner246 marked this conversation as resolved.
Show resolved Hide resolved

auto col_size = 8;

Expand Down