Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load nutrients #199

Merged
merged 21 commits into from
Aug 11, 2023
Merged

Load nutrients #199

merged 21 commits into from
Aug 11, 2023

Conversation

jamesturner246
Copy link
Contributor

@jamesturner246 jamesturner246 commented Aug 8, 2023

Fixes #162

I'll get this going, as the bulk of it is there.

I have a new example directory example_new, containing the new configs. France.NewEBM.json has another field: FoodsDataFile, specifying the CSV foods nutrients file similar to existing syntax in main config file./

the data is loaded inside model_parser load_newebm_model, and for now that's it. I did a little tidy up in csvparser as well. The helper functions are put in an anonymous namespace in csvparser.cpp, and declarations are removed from the header. This should make the interfaces a little less cluttered with helper fns.

I'll finalise and merge this after #200 merges, since it uses a bit of common code, which is now changing.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

return hc::case_insensitive::equals(pair.first, str);
std::map<std::string, std::string, hc::case_insensitive::comparator> csv_column_map;
for (const auto &[col_name, col_type] : columns) {
auto is_match = [&col_name](const auto &csv_col_name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'col_name' in capture list does not name a variable [clang-diagnostic-error]

        auto is_match = [&col_name](const auto &csv_col_name) {
                          ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a bug in clang, where it can't capture structured bindings in lambdas, even though C++20 allows it. https://stackoverflow.com/questions/46114214/lambda-implicit-capture-fails-with-variable-declared-from-structured-binding

Work to fix it here: https://reviews.llvm.org/D122768

For now I guess we don't use structured bindings here, unless we can work around that lambda.

std::map<std::string, std::string, hc::case_insensitive::comparator> csv_column_map;
for (const auto &[col_name, col_type] : columns) {
auto is_match = [&col_name](const auto &csv_col_name) {
return hc::case_insensitive::equals(col_name, csv_col_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: reference to local binding 'col_name' declared in enclosing function 'host::load_datatable_from_csv' [clang-diagnostic-error]

            return hc::case_insensitive::equals(col_name, csv_col_name);
                                                ^
Additional context

src/HealthGPS.Console/csvparser.cpp:111: 'col_name' declared here

    for (const auto &[col_name, col_type] : columns) {
                      ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

@jamesturner246 jamesturner246 marked this pull request as ready for review August 9, 2023 14:32
Copy link
Contributor

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. I've just made a comment on reusability of code blocks and another one on using Excel files.

example_new/MAFFCODE with NUTRIENTS and NPM scores.xlsx Outdated Show resolved Hide resolved
Comment on lines +169 to +172
file_path = config.file.name;
if (file_path.is_relative()) {
file_path = config.root_path / file_path;
config.file.name = file_path.string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same structure to make the path absolute is used in several places. I would suggest to write a small function that does the job and that you can reuse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this on my other branch, so maybe hold fire @jamesturner246. I'm happy to fix this up for you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, thanks.

Comment on lines +188 to +191
file_path = model.second;
if (file_path.is_relative()) {
file_path = config.root_path / file_path;
model.second = file_path.string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

Comment on lines +202 to +206
file_path = item.second;
if (file_path.is_relative()) {
file_path = config.root_path / file_path;
item.second = file_path.string();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

Copy link
Contributor

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look sensible to me. I've made some minor suggestions.

I'm not entirely clear what the contents of the example_new folder are though, other than that they're intended to be ingested by the new model. How do the files differ from those in example/ if at all? Are they just going to be example data? Or were you intending to replace them with real-world data at some point? Real-world data should probably live somewhere outside this repo; example data's fine, but could we possibly use a smaller dataset? I know the existing example dataset is huge, but I don't know that we want to commit 100,000s of lines of data to the repo every time we add a new model. Maybe example_new needs a readme to explain things a bit?

I think we should probably merge this before #200, because a) it'll be ready sooner and b) the refactoring on #200 is more extensive, so I think it's easier to do it this way round. In future we probs need to coordinate a bit better because there are going to be a whole lot of merge conflicts here (I'm not saying this is your fault!).

310103,"Cornsnacks, based on maize",31.9,11.77,12.91,5.78,59.3,537,1.13,4.6,0,0,2.1,0,7,21
310104,Wheat based savoury snack,28,17.24,7.71,1.76,51.8,488,0.935,2.2,0,0,5.7,46,10.4,5
310201,Popcorn,20,1.72,9.48,7.01,77.6,480,0.056,62.1,61.8,0,1.8,18,2.1,15
310301,Other savoury snacks (including hors d'oeuvres),16.246,5.924,7.152,1.933,13.365,244.639,0.402103,0.287,0.012,0,0.53,57.59,12.12,0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love that there's someone out there looking at the health impact of eating hors d'oeuvres 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope they put a lot of thought into it! xD

example_new/MAFFCODE with NUTRIENTS and NPM scores.xlsx Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a straight copy of the other France.HLM.json file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Temporary until the new initialisation is worked out. The dynamic model name is different: NewEBM, but that's it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely clear on what the difference between the current example and data folders is, but might this belong in a data folder instead? The other one has things like country codes in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all quite temporary for the moment, until data and the rest of the model comes in, and we can organise it properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better explanation: I wasn't really sure about this myself at first.

The data dir contains more permanent and widely applicable data, such as country codes (don't change often), diseases (also don't), etcetera.

The example dir contains input data for the current experiment, such as population initialisation and extrapolation data for France, at year 20xx.

There's probably some improvements to be made here, but, for now at least, there's separation of generally common and static data, and the data for the current experiment (year, country, etcetera). Hope that makes more sense.

Comment on lines +169 to +172
file_path = config.file.name;
if (file_path.is_relative()) {
file_path = config.root_path / file_path;
config.file.name = file_path.string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this on my other branch, so maybe hold fire @jamesturner246. I'm happy to fix this up for you

src/HealthGPS.Console/csvparser.cpp Outdated Show resolved Hide resolved
@jamesturner246
Copy link
Contributor Author

The examples_new is quite temporary. It's using parts of the older model, until the new model and data come in. I made a new dir temporarily so I can change things more freely without breaking the old one. Given the state of flux, I don't think we should prematurely optimise these until we actually know what's going in them.

Agree on the coordination point .Understand it's a bit annoying. it's quite hard to change things at the moment, given how tightly coupled the various config, settings, options, poco things are.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

src/HealthGPS.Console/csvparser.cpp Outdated Show resolved Hide resolved
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@jamesturner246 jamesturner246 merged commit e40b141 into main Aug 11, 2023
@jamesturner246 jamesturner246 deleted the load_nutrients branch August 11, 2023 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Populate food nutrient coefficients in France.NewEBM.json
3 participants