-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to emit the root document if the desired sub-document does not exist #189
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good and feature makes sense. Just had some small suggestions on testing.
tests/test_minilinq.py
Outdated
Reference("id"), Reference('baz'), Reference('$.id'), Reference('$.foo.id'), Reference('$.foo.name') | ||
])) | ||
self.check_case(mmap.eval(env), [ | ||
['1.bar.1.bar.[0]', 'a1', '1', '1.bid', 'zip'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id
is parent_id.key.child_id
? and then if no child id is present then it's parent_id.key.parent_id.key[index]
?
(mostly just confused by the first entry in this list)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually if I'm being honest I'm having a hard time wrapping my head around several of these test cases. I appreciate the density of the test, but I wonder if some comments or named test cases might help a bit? Or if you think it's clear and the problem is my own understanding of minilinq that's fair too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is confusing and results from both a bug in jsonpath-rw and a bug in commcare-export:
- Don't clobber existing IDs with 'auto_id' kennknowles/python-jsonpath-rw#96
- remove additional auto-id field #188
I would like to fix these issues but it would results in breaking certain exports (any that are using the feature that allows you to specify the root doc expression in the Data Source
column.
I'm going to think about how we could do the rollout. If you have thought's I'd be keen to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way they break is that the IDs change, yeah? So if one were using SQL, and, e.g. a case was modified, then you could end up with duplicate rows for the children?
Are you worried about migrating exports that Dimagi maintains or breaking external usages? In general introducing breaking changes seems maybe acceptable so long as it's a major version bump and it's clearly documented (including how to fix - maybe in this case just by wiping your data and starting over?). Certainly the expectation would be that one should read release notes for any major library upgrade.
We could also add a warning statement in a point release to log when someone would be affected by the upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way they break is that the IDs change, yeah? So if one were using SQL, and, e.g. a case was modified, then you could end up with duplicate rows for the children?
That's correct.
Are you worried about migrating exports that Dimagi maintains or breaking external usages? In general introducing breaking changes seems maybe acceptable so long as it's a major version bump and it's clearly documented (including how to fix - maybe in this case just by wiping your data and starting over?). Certainly the expectation would be that one should read release notes for any major library upgrade.
I'm don't know of any that Dimagi maintains which use this feature. But if Dimagi is maintaining it we could work around the issue. I'm mostly concerned about those 'in the wild'.
We could also add a warning statement in a point release to log when someone would be affected by the upgrade.
Yes, that's also a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually if I'm being honest I'm having a hard time wrapping my head around several of these test cases. I appreciate the density of the test, but I wonder if some comments or named test cases might help a bit? Or if you think it's clear and the problem is my own understanding of minilinq that's fair too.
Refactored tests: 4d194d2 (much better, thanks for the prompt)
tests/test_minilinq.py
Outdated
['1.bar.1.bar.[0]', 'a1', '1', '1.bid', 'zip'], | ||
['1.bar.bazzer', 'a2', '1', '1.bid', 'zip'], | ||
['2', [], '2', '2.bid', 'zap'], | ||
['3.bar.3.bar.[0]', [], '3', '3.bid', 'mip'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also find it odd how different the id
is in these two examples. it's because the dict counts as an entry but an empty list does not?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your confusion is well deserved. There are some 'bugs' and oddities in commcare-export as well as jsonpath-rw that cause this. I pushed a test to master earlier in the day to 'document' the weirdness: https://github.com/dimagi/commcare-export/blob/master/tests/test_minilinq.py#L58
I am planning to replace jsonpath-rw with jsonpath-ng which has a bunch of bugfixes and improvements and will resolve this specific issue (for this case 'id' will become '3').
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks I missed that. Sounds like I can mostly ignore the first column in these then? If so then everything is much clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(could we remove the id column from this test if it's already tested elsewhere and not part of what's being tested here?)
edit: actually I don't think this is a good idea since it is relevant to disambiguate what the base object is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/test_minilinq.py
Outdated
value_or_root = get_value_or_root_expression('bar.[*]') | ||
flatmap = FlatMap(source=Literal(data), body=value_or_root) | ||
mmap = Map(source=flatmap, body=List([ | ||
Reference("id"), Reference('baz'), Reference('$.id'), Reference('$.foo.id'), Reference('$.foo.name') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be worth adding a test case where baz
exists on the parent (and the child is empty)? Is the behavior there that it would use it or that it would remain empty if it has fallen back to the parent emit (my guess is the desired behavior is to be empty)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question, it would actually take the value from the parent:
data = {"id": 6, "foo": {'id': 'bid', 'name': 'mop'}, "baz": "root_bazz"}
# id, baz, $.id, $.foo.id, $.foo.name
result = ['6', 'root_bazz', '6', '6.bid', 'mop']
That definitely isn't great. I have no idea how to work around that. I might be able to hack it by setting a special var in the env if we're in the case where we're using the root doc and then change the JsonPathEnv to only allow 'root' expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - I guess it's not possible to emit something like {'$': root}
instead of the root document itself? (not sure if that even makes sense)
I don't know that it's an important enough edge-case to worry about, though does seem worth at least documenting the "expected" behavior in a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not loving this but.... 8ac159f
This works around the special case where we have replaced the doc with the root doc. In this case any field reference that is not 'root based' should be ignored.
looks like the other test now also needs to be updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good by me once tests pass. Thanks for filling in the gaps in my knowledge!
tests/test_minilinq.py
Outdated
Reference("id"), Reference('baz'), Reference('$.id'), Reference('$.foo.id'), Reference('$.foo.name') | ||
])) | ||
self.check_case(mmap.eval(env), [ | ||
['1.bar.1.bar.[0]', 'a1', '1', '1.bid', 'zip'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way they break is that the IDs change, yeah? So if one were using SQL, and, e.g. a case was modified, then you could end up with duplicate rows for the children?
Are you worried about migrating exports that Dimagi maintains or breaking external usages? In general introducing breaking changes seems maybe acceptable so long as it's a major version bump and it's clearly documented (including how to fix - maybe in this case just by wiping your data and starting over?). Certainly the expectation would be that one should read release notes for any major library upgrade.
We could also add a warning statement in a point release to log when someone would be affected by the upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for test cleanup!
The focus of the changes here is to allow exporting data from the messaging event API even when there are no messages. Each event may have multiple messages so in order to get one row per message we need an expression like this:
However if there are no messages, as is the case when the event errors, we still want to export one row for the event. To accomplish this a new command line option is added to the CLI:
When this option is used in conjunction with a Data Source query like those given in the examples, the minilnq expression
is adjusted so that if the main expression returns no results, the root document is used instead:
The
_or_raw
function is similar toor
but it does not fully unwrap the values.