-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitise input data #200
Sanitise input data #200
Conversation
clang-tidy review says "All clean, LGTM! 👍" |
6aa55d0
to
0e35a28
Compare
clang-tidy review says "All clean, LGTM! 👍" |
They make more sense there.
0e35a28
to
57e0a72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
57e0a72
to
dc1a050
Compare
clang-tidy review says "All clean, LGTM! 👍" |
This is so we can test them with GTest.
dc1a050
to
7931f38
Compare
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
6e1313b
to
d857633
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just commented on a couple of very high level points. Otherwise, I think it looks good and, as far as I can tell, the split of functionality in new files makes sense.
I don't know if it is common in C++, but it might be useful to add a documentation header in each file explaining what the contents of that file are about, so developers - old and new - can learn what to expect in each case when reading the code. We should do the same in our python projects, by the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same way you have done with the configuration files, I miss some more thorough docstrings in this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope poco
is a clever acronym of something meaningful in Health-GPS. In Spanish, it means little
, so it always looks odd to me when reading the code :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently it means "plain old class object": https://imperialchepi.github.io/healthgps/architecture
I'm not sure that this naming is terribly informative though. And I don't know why these structs live in host::poco
rather than hgps::poco
either 🤷. Maybe we should rename it at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, some docstrings - at least for the functions you have touched - will be useful.
src/HealthGPS.Console/program.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there's no header file for this one? Not that matters, but I'm curious about why sometimes there is and sometimes there isn't, being rather novice in C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's basically because there are no other .cpp
that need any of the functions in program.cpp
, so we don't need to bother. It seems like most functions in HealthGPS are defined in headers, but a lot of these definitions could be dropped if we wanted because the functions aren't actually being used outside of a single .cpp
file.
These functions may give different values and notably do so on the Windows GitHub Runner.
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a bunch safer, with some nice reuse of existing core code. 💯
There's quite a lot going on here, though. I wasn't able to go through with as much rigour as I'd like, so I've probably missed some things. I think breaking up these changes and planning/synchronising should be in our minds going forward in these kinds of big PRs.
I've picked some things that jumped out at me, and perhaps I'll have a few more once I digest it tomorrow.
src/HealthGPS.Console/jsonparser.h
Outdated
template <typename T> | ||
void to_json(nlohmann::json &j, const std::optional<hgps::core::Interval<T>> &p) { | ||
if (p) { | ||
j = *p; | ||
} else { | ||
// Null interval expressed as empty JSON array | ||
j = nlohmann::json::array(); | ||
} | ||
} | ||
|
||
template <typename T> | ||
void from_json(const nlohmann::json &j, std::optional<hgps::core::Interval<T>> &p) { | ||
// Treat null JSON values and empty arrays as a null Interval | ||
if (j.is_null() || (j.is_array() && j.empty())) { | ||
p = std::nullopt; | ||
} else { | ||
p = j.get<hgps::core::Interval<T>>(); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't needed, since they will be handled by the Interval<T>
to/from functions indirectly though the optional<T>
to/from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, but currently the code uses empty arrays to indicate a null interval rather than a JSON null value, which is really what it should be, so I added these helpers to avoid introducing a breaking change. I meant to add a comment saying as much, but forgot.
That said, I don't like this and I think the right solution is to do away with this special-casing and just use null values instead, though that would mean changing config files too. Shall I open an issue for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that the null values are preferred. I think I already started using null values elsewhere for things. /Go ahead.
@@ -5,16 +5,15 @@ find_package(TBB CONFIG REQUIRED) | |||
find_path(RAPIDCSV_INCLUDE_DIRS "rapidcsv.h") | |||
find_package(Threads REQUIRED) | |||
|
|||
add_executable(HealthGPS.Console "") | |||
target_compile_features(HealthGPS.Console PUBLIC cxx_std_${CMAKE_CXX_STANDARD}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why no target_compile_features(HealthGPS.Console PUBLIC cxx_std_${CMAKE_CXX_STANDARD})
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's redundant because targets are already built using ${CMAKE_CXX_STANDARD}
anyway.
std::map<std::string, std::string> columns; | ||
|
||
auto operator<=>(const FileInfo &rhs) const = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the spaceship operators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new C++20 thing. If you add a default spaceship operator, then it automatically generates other comparison functions too (i.e. operator==
, operator<
etc.): https://en.cppreference.com/w/cpp/language/default_comparisons
Oddly you don't get a default operator==
otherwise, which seems a little odd to me.
// Data file information | ||
void to_json(json &j, const FileInfo &p) { | ||
j = json{ | ||
{"name", p.name}, {"format", p.format}, {"delimiter", p.delimiter}, {"columns", p.columns}}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's currently used by one of the unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, mu bad.
// SES Model Information | ||
void to_json(json &j, const SESInfo &p) { | ||
j = json{{"function_name", p.function}, {"function_parameters", p.parameters}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... in fact, I don't think any of the POCO-type to_json
functions are actually needed anywhere. The only JSON writing is internal hgps event messages, and done in result_file_writer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same and went through deleting a bunch of them, but then when I got round
to writing unit tests I found they were useful for that.
} | ||
|
||
// NOLINTNEXTLINE(bugprone-exception-escape) | ||
void rebase_valid_path_to(const json &j, const std::string &key, std::filesystem::path &out, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the return-by-reference also needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not, really. It's just that out
might not be updated. Maybe in that case,
returning an empty std::filesystem::path
wouldn't be terrible though.
Alternatively, this function could just return a bool
indicating whether it's
succeeded, which is what I've done elsewhere (e.g. for the get_to
function). That
would at least be consistent.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant that there's two rebase_valid_path_to
, one on ln 34 which returns success
as the return value, and another which returns success
via a bool
reference on ln 51. Are both necessary?
Yeah, for the sake of consistency, the boolean flag method is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overload returning a bool
isn't actually used anywhere, but I figured we might as
well keep it because we might do in future and it kind of makes sense to implement the
other overload using it anyway. That said, it's up to you. I could get rid of it and it
would save us a few lines of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this header is more for risk factor model JSON parsing stuff, right? Was the plan to have a separate header for the parsing interfaces of each model type? Maybe renaming this something like configuration_parsing_risk_factor.h
would clarify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. The reason I put these bits into their own header was really just because it felt like configuration_parsing.h
was getting a bit cluttered and these functions only needed to be exported so we could access them for the tests.
Co-authored-by: James Paul Turner <[email protected]>
Thanks for the review!
Yeah, sorry about that 😞. If I'd split this up into smaller PRs it also would have made it less likely we'd end up with merge conflicts too. For the next big refactor I'll plan ahead a bit more rather than just winging it. |
clang-tidy review says "All clean, LGTM! 👍" |
This reverts commit 0157f76.
clang-tidy review says "All clean, LGTM! 👍" |
We don't need the extra option to be able to use an empty array. Let's make users express null Intervals with a JSON null value.
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go ahead 👍
clang-tidy review says "All clean, LGTM! 👍" |
So I think this is finally ready for review. I'm afraid it's rather on the long side if you go by the diffstat, however a lot of the change is just moving things from one file to another and much of the rest is tests so I hope you'll forgive me 😉.
The goal of this PR was to refactor the config parsing code to make loading JSON files safer. Currently, exceptions are only used intermittently and in many places, if the JSON file is malformed then a warning will be emitted, but the simulation will still run, just using default values. I've tried to make these config parsing errors as informative as possible, by printing error messages whenever malformed values are found and only throwing an exception at the last possible point. I've tried not to modify the file format in any way and the existing example data files load (seemingly) correctly.
I've also added tests for all of the config parsing code as well as (I think) any other bits I've changed, so hopefully we won't break any of this going forward.
I've rearranged files somewhat too. In particular,
configuration.cpp
ended up rather long by the time I'd finished with it, so I moved the code for parsing command line options into its own files as well as the lower-level config parsing code.Finally, while I was working on this I introduced smatterings of type changes all over to make the types more descriptive and safer, e.g. converting
std::string
s tostd::filesystem::path
s where appropriate.Fixes #161.